Skip to content

Commit

Permalink
Fix discounted returns calculation for Cal-QL (tinkoff-ai#63)
Browse files Browse the repository at this point in the history
* fix discounted_returns calculation for Cal-QL

* Update README.md

* Update results

---------

Co-authored-by: Denis Tarasov <[email protected]>
  • Loading branch information
nakamotoo and Denis Tarasov authored Jul 10, 2023
1 parent 6c5488f commit 962688b
Show file tree
Hide file tree
Showing 15 changed files with 115 additions and 80 deletions.
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,35 +171,35 @@ You can check the links above for learning curves and details. Here, we report r
#### Scores
| **Task-Name** |AWAC|CQL|IQL|SPOT|Cal-QL|
|---------------------------|------------|--------|--------|-----|-----|
| antmaze-umaze-v2 |52.75 ± 8.67 → 98.75 ± 1.09|94.00 ± 1.58 → 99.50 ± 0.87|77.00 ± 0.71 → 96.50 ± 1.12|91.00 ± 2.55 → 99.50 ± 0.50|65.75 ± 3.49 → 99.75 ± 0.43|
| antmaze-umaze-diverse-v2 |56.00 ± 2.74 → 0.00 ± 0.00|9.50 ± 9.91 → 99.00 ± 1.22|59.50 ± 9.55 → 63.75 ± 25.02|36.25 ± 2.17 → 95.00 ± 3.67|48.75 ± 3.8397.50 ± 3.77|
| antmaze-medium-play-v2 |0.00 ± 0.00 → 0.00 ± 0.00|59.00 ± 11.18 → 97.75 ± 1.30|71.75 ± 2.95 → 89.75 ± 1.09|67.25 ± 10.47 → 97.25 ± 1.30|0.00 ± 0.00 → 98.50 ± 0.87|
| antmaze-medium-diverse-v2 |0.00 ± 0.00 → 0.00 ± 0.00|63.50 ± 6.84 → 97.25 ± 1.92|64.25 ± 1.92 → 92.25 ± 2.86|73.75 ± 7.29 → 94.50 ± 1.66|1.25 ± 0.8396.25 ± 3.63|
| antmaze-large-play-v2 |0.00 ± 0.00 → 0.00 ± 0.00|28.75 ± 7.76 → 88.25 ± 2.28|38.50 ± 8.73 → 64.50 ± 17.04|31.50 ± 12.58 → 87.00 ± 3.24|0.25 ± 0.4392.25 ± 3.70|
| antmaze-large-diverse-v2 |0.00 ± 0.00 → 0.00 ± 0.00|35.50 ± 3.64 → 91.75 ± 3.96|26.75 ± 3.77 → 64.25 ± 4.15|17.50 ± 7.26 → 81.00 ± 14.14|0.00 ± 0.0089.75 ± 2.59|
|antmaze-umaze-v2|52.75 ± 8.67 → 98.75 ± 1.09|94.00 ± 1.58 → 99.50 ± 0.87|77.00 ± 0.71 → 96.50 ± 1.12|91.00 ± 2.55 → 99.50 ± 0.50|76.75 ± 7.53 → 99.75 ± 0.43|
|antmaze-umaze-diverse-v2|56.00 ± 2.74 → 0.00 ± 0.00|9.50 ± 9.91 → 99.00 ± 1.22|59.50 ± 9.55 → 63.75 ± 25.02|36.25 ± 2.17 → 95.00 ± 3.67|32.00 ± 27.7998.50 ± 1.12|
|antmaze-medium-play-v2|0.00 ± 0.00 → 0.00 ± 0.00|59.00 ± 11.18 → 97.75 ± 1.30|71.75 ± 2.95 → 89.75 ± 1.09|67.25 ± 10.47 → 97.25 ± 1.30|71.75 ± 3.27 → 98.75 ± 1.64|
|antmaze-medium-diverse-v2|0.00 ± 0.00 → 0.00 ± 0.00|63.50 ± 6.84 → 97.25 ± 1.92|64.25 ± 1.92 → 92.25 ± 2.86|73.75 ± 7.29 → 94.50 ± 1.66|62.00 ± 4.3098.25 ± 1.48|
|antmaze-large-play-v2|0.00 ± 0.00 → 0.00 ± 0.00|28.75 ± 7.76 → 88.25 ± 2.28|38.50 ± 8.73 → 64.50 ± 17.04|31.50 ± 12.58 → 87.00 ± 3.24|31.75 ± 8.8797.25 ± 1.79|
|antmaze-large-diverse-v2|0.00 ± 0.00 → 0.00 ± 0.00|35.50 ± 3.64 → 91.75 ± 3.96|26.75 ± 3.77 → 64.25 ± 4.15|17.50 ± 7.26 → 81.00 ± 14.14|44.00 ± 8.6991.50 ± 3.91|
| | | | | | | | | | |
| **antmaze average** |18.12 → 16.46|48.38 → 95.58|56.29 → 78.50|52.88 → 92.38|19.3395.67|
| **antmaze average** |18.12 → 16.46|48.38 → 95.58|56.29 → 78.50|52.88 → 92.38|53.0497.33|
| | | | | | | | | | |
| pen-cloned-v1 |88.66 ± 15.10 → 86.82 ± 11.12|-2.76 ± 0.08 → -1.28 ± 2.16|84.19 ± 3.96 → 102.02 ± 20.75|6.19 ± 5.21 → 43.63 ± 20.09|-2.64 ± 0.140.04 ± 3.95|
| door-cloned-v1 |0.93 ± 1.66 → 0.01 ± 0.00|-0.33 ± 0.01 → -0.33 ± 0.01|1.19 ± 0.93 → 20.34 ± 9.32|-0.21 ± 0.14 → 0.02 ± 0.31|-0.33 ± 0.01 → -0.33 ± 0.01|
| hammer-cloned-v1 |1.80 ± 3.01 → 0.24 ± 0.04|0.56 ± 0.55 → 2.85 ± 4.81|1.35 ± 0.32 → 57.27 ± 28.49|3.97 ± 6.39 → 3.73 ± 4.99|0.27 ± 0.01 → 0.14 ± 0.15|
| relocate-cloned-v1 |-0.04 ± 0.04 → -0.04 ± 0.01|-0.33 ± 0.01 → -0.33 ± 0.01|0.04 ± 0.04 → 0.32 ± 0.38|-0.24 ± 0.01 → -0.15 ± 0.05|-0.33 ± 0.01 → -0.33 ± 0.00|
|pen-cloned-v1|88.66 ± 15.10 → 86.82 ± 11.12|-2.76 ± 0.08 → -1.28 ± 2.16|84.19 ± 3.96 → 102.02 ± 20.75|6.19 ± 5.21 → 43.63 ± 20.09|-2.66 ± 0.04-2.68 ± 0.12|
|door-cloned-v1|0.93 ± 1.66 → 0.01 ± 0.00|-0.33 ± 0.01 → -0.33 ± 0.01|1.19 ± 0.93 → 20.34 ± 9.32|-0.21 ± 0.14 → 0.02 ± 0.31|-0.33 ± 0.01 → -0.33 ± 0.01|
|hammer-cloned-v1|1.80 ± 3.01 → 0.24 ± 0.04|0.56 ± 0.55 → 2.85 ± 4.81|1.35 ± 0.32 → 57.27 ± 28.49|3.97 ± 6.39 → 3.73 ± 4.99|0.25 ± 0.04 → 0.17 ± 0.17|
|relocate-cloned-v1|-0.04 ± 0.04 → -0.04 ± 0.01|-0.33 ± 0.01 → -0.33 ± 0.01|0.04 ± 0.04 → 0.32 ± 0.38|-0.24 ± 0.01 → -0.15 ± 0.05|-0.31 ± 0.05 → -0.31 ± 0.04|
| | | | | | | | | | |
| **adroit average** |22.84 → 21.76|-0.72 → 0.22|21.69 → 44.99|2.43 → 11.81|-0.76 → -0.12|
| **adroit average** |22.84 → 21.76|-0.72 → 0.22|21.69 → 44.99|2.43 → 11.81|-0.76 → -0.79|

#### Regrets
| **Task-Name** |AWAC|CQL|IQL|SPOT|Cal-QL|
|---------------------------|------------|--------|--------|-----|-----|
|antmaze-umaze-v2|0.04 ± 0.01|0.02 ± 0.00|0.07 ± 0.00|0.02 ± 0.00|0.02 ± 0.00|
|antmaze-umaze-diverse-v2|0.88 ± 0.01|0.09 ± 0.01|0.43 ± 0.11|0.22 ± 0.07|0.04 ± 0.01|
|antmaze-medium-play-v2|1.00 ± 0.00|0.08 ± 0.01|0.09 ± 0.01|0.06 ± 0.00|0.08 ± 0.01|
|antmaze-medium-diverse-v2|1.00 ± 0.00|0.08 ± 0.00|0.10 ± 0.01|0.05 ± 0.01|0.08 ± 0.01|
|antmaze-large-play-v2|1.00 ± 0.00|0.21 ± 0.02|0.34 ± 0.05|0.29 ± 0.07|0.29 ± 0.04|
|antmaze-large-diverse-v2|1.00 ± 0.00|0.21 ± 0.03|0.41 ± 0.03|0.23 ± 0.08|0.29 ± 0.06|
|antmaze-umaze-v2|0.04 ± 0.01|0.02 ± 0.00|0.07 ± 0.00|0.02 ± 0.00|0.01 ± 0.00|
|antmaze-umaze-diverse-v2|0.88 ± 0.01|0.09 ± 0.01|0.43 ± 0.11|0.22 ± 0.07|0.05 ± 0.01|
|antmaze-medium-play-v2|1.00 ± 0.00|0.08 ± 0.01|0.09 ± 0.01|0.06 ± 0.00|0.04 ± 0.01|
|antmaze-medium-diverse-v2|1.00 ± 0.00|0.08 ± 0.00|0.10 ± 0.01|0.05 ± 0.01|0.04 ± 0.01|
|antmaze-large-play-v2|1.00 ± 0.00|0.21 ± 0.02|0.34 ± 0.05|0.29 ± 0.07|0.13 ± 0.02|
|antmaze-large-diverse-v2|1.00 ± 0.00|0.21 ± 0.03|0.41 ± 0.03|0.23 ± 0.08|0.13 ± 0.02|
| | | | | | | | | | |
| **antmaze average** |0.82|0.11|0.24|0.15|0.13|
| **antmaze average** |0.82|0.11|0.24|0.15|0.07|
| | | | | | | | | | |
|pen-cloned-v1|0.46 ± 0.02|0.97 ± 0.00|0.37 ± 0.01|0.58 ± 0.02|0.97 ± 0.01|
|pen-cloned-v1|0.46 ± 0.02|0.97 ± 0.00|0.37 ± 0.01|0.58 ± 0.02|0.98 ± 0.01|
|door-cloned-v1|1.00 ± 0.00|1.00 ± 0.00|0.83 ± 0.03|0.99 ± 0.01|1.00 ± 0.00|
|hammer-cloned-v1|1.00 ± 0.00|1.00 ± 0.00|0.65 ± 0.10|0.98 ± 0.01|1.00 ± 0.00|
|relocate-cloned-v1|1.00 ± 0.00|1.00 ± 0.00|1.00 ± 0.00|1.00 ± 0.00|1.00 ± 0.00|
Expand Down
63 changes: 44 additions & 19 deletions algorithms/finetune/cal_ql.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ class TrainConfig:
reward_bias: float = 0.0 # Reward bias for normalization
# Cal-QL
mixing_ratio: float = 0.5 # Data mixing ratio for online tuning
is_sparse_reward: bool = False # Use sparse reward
# Wandb logging
project: str = "CORL"
group: str = "Cal-QL-D4RL"
Expand Down Expand Up @@ -271,34 +272,47 @@ def return_reward_range(dataset: Dict, max_episode_steps: int) -> Tuple[float, f
return min(returns), max(returns)


def get_return_to_go(dataset: Dict, gamma: float, max_episode_steps: int) -> List[float]:
def get_return_to_go(dataset: Dict, env: gym.Env, config: TrainConfig) -> np.ndarray:
returns = []
ep_ret, ep_len = 0.0, 0
cur_rewards = []
terminals = []
for r, d in zip(dataset["rewards"], dataset["terminals"]):
N = len(dataset["rewards"])
for t, (r, d) in enumerate(zip(dataset["rewards"], dataset["terminals"])):
ep_ret += float(r)
cur_rewards.append(float(r))
terminals.append(float(d))
ep_len += 1
if d or ep_len == max_episode_steps:
is_last_step = (
(t == N - 1)
or ( # noqa
np.linalg.norm(
dataset["observations"][t + 1] - dataset["next_observations"][t]
)
> 1e-6 # noqa
)
or ep_len == env._max_episode_steps # noqa
)

if d or is_last_step:
discounted_returns = [0] * ep_len
prev_return = 0
for i in reversed(range(ep_len)):
discounted_returns[i] = cur_rewards[i] + gamma * prev_return * (
1 - terminals[i]
)
prev_return = discounted_returns[i]
if (
config.is_sparse_reward
and r # noqa
== env.ref_min_score * config.reward_scale + config.reward_bias # noqa
):
discounted_returns = [r / (1 - config.discount)] * ep_len
else:
for i in reversed(range(ep_len)):
discounted_returns[i] = cur_rewards[
i
] + config.discount * prev_return * (1 - terminals[i])
prev_return = discounted_returns[i]
returns += discounted_returns
ep_ret, ep_len = 0.0, 0
cur_rewards = []
terminals = []
discounted_returns = [0] * ep_len
prev_return = 0
for i in reversed(range(ep_len)):
discounted_returns[i] = cur_rewards[i] + gamma * prev_return * (1 - terminals[i])
prev_return = discounted_returns[i]
returns += discounted_returns
return returns


Expand Down Expand Up @@ -352,7 +366,10 @@ def init_module_weights(module: torch.nn.Module, orthogonal_init: bool = False):

class ReparameterizedTanhGaussian(nn.Module):
def __init__(
self, log_std_min: float = -20.0, log_std_max: float = 2.0, no_tanh: bool = False
self,
log_std_min: float = -20.0,
log_std_max: float = 2.0,
no_tanh: bool = False,
):
super().__init__()
self.log_std_min = log_std_min
Expand All @@ -373,7 +390,10 @@ def log_prob(
return torch.sum(action_distribution.log_prob(sample), dim=-1)

def forward(
self, mean: torch.Tensor, log_std: torch.Tensor, deterministic: bool = False
self,
mean: torch.Tensor,
log_std: torch.Tensor,
deterministic: bool = False,
) -> Tuple[torch.Tensor, torch.Tensor]:
log_std = torch.clamp(log_std, self.log_std_min, self.log_std_max)
std = torch.exp(log_std)
Expand Down Expand Up @@ -984,7 +1004,7 @@ def train(config: TrainConfig):
reward_scale=config.reward_scale,
reward_bias=config.reward_bias,
)
mc_returns = get_return_to_go(dataset, config.discount, max_episode_steps=1000)
mc_returns = get_return_to_go(dataset, env, config)
dataset["mc_returns"] = np.array(mc_returns)
assert len(dataset["mc_returns"]) == len(dataset["rewards"])

Expand Down Expand Up @@ -1044,7 +1064,10 @@ def train(config: TrainConfig):
critic_2_optimizer = torch.optim.Adam(list(critic_2.parameters()), config.qf_lr)

actor = TanhGaussianPolicy(
state_dim, action_dim, max_action, orthogonal_init=config.orthogonal_init
state_dim,
action_dim,
max_action,
orthogonal_init=config.orthogonal_init,
).to(config.device)
actor_optimizer = torch.optim.Adam(actor.parameters(), config.policy_lr)

Expand Down Expand Up @@ -1112,7 +1135,9 @@ def train(config: TrainConfig):
episode_step += 1
action, _ = actor(
torch.tensor(
state.reshape(1, -1), device=config.device, dtype=torch.float32
state.reshape(1, -1),
device=config.device,
dtype=torch.float32,
)
)
action = action.cpu().data.numpy().flatten()
Expand Down
1 change: 1 addition & 0 deletions configs/finetune/cal_ql/antmaze/large_diverse_v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ q_n_hidden_layers: 5
reward_scale: 10.0
reward_bias: -5.0
use_automatic_entropy_tuning: true
is_sparse_reward: true
1 change: 1 addition & 0 deletions configs/finetune/cal_ql/antmaze/large_play_v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ q_n_hidden_layers: 5
reward_scale: 10.0
reward_bias: -5.0
use_automatic_entropy_tuning: true
is_sparse_reward: true
1 change: 1 addition & 0 deletions configs/finetune/cal_ql/antmaze/medium_diverse_v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ q_n_hidden_layers: 5
reward_scale: 10.0
reward_bias: -5.0
use_automatic_entropy_tuning: true
is_sparse_reward: true
1 change: 1 addition & 0 deletions configs/finetune/cal_ql/antmaze/medium_play_v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ q_n_hidden_layers: 5
reward_scale: 10.0
reward_bias: -5.0
use_automatic_entropy_tuning: true
is_sparse_reward: true
1 change: 1 addition & 0 deletions configs/finetune/cal_ql/antmaze/umaze_diverse_v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ q_n_hidden_layers: 5
reward_scale: 10.0
reward_bias: -5.0
use_automatic_entropy_tuning: true
is_sparse_reward: true
1 change: 1 addition & 0 deletions configs/finetune/cal_ql/antmaze/umaze_v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ q_n_hidden_layers: 5
reward_scale: 10.0
reward_bias: -5.0
use_automatic_entropy_tuning: true
is_sparse_reward: true
1 change: 1 addition & 0 deletions configs/finetune/cal_ql/door/cloned_v1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ q_n_hidden_layers: 3
reward_scale: 1.0
reward_bias: 0.0
use_automatic_entropy_tuning: true
is_sparse_reward: false
1 change: 1 addition & 0 deletions configs/finetune/cal_ql/hammer/cloned_v1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ q_n_hidden_layers: 3
reward_scale: 1.0
reward_bias: 0.0
use_automatic_entropy_tuning: true
is_sparse_reward: false
1 change: 1 addition & 0 deletions configs/finetune/cal_ql/pen/cloned_v1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ q_n_hidden_layers: 3
reward_scale: 1.0
reward_bias: 0.0
use_automatic_entropy_tuning: true
is_sparse_reward: false
1 change: 1 addition & 0 deletions configs/finetune/cal_ql/relocate/cloned_v1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@ q_n_hidden_layers: 3
reward_scale: 1.0
reward_bias: 0.0
use_automatic_entropy_tuning: true
is_sparse_reward: false
Binary file modified results/bin/finetune_scores.pickle
Binary file not shown.
2 changes: 1 addition & 1 deletion results/get_finetune_urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def get_urls(sweep_id, algo_name):

get_urls("tlab/CORL/sweeps/ucrmi909", "IQL")

get_urls("tlab/CORL/sweeps/vwtp6lyr", "Cal-QL")
get_urls("tlab/CORL/sweeps/efvz7d68", "Cal-QL")

dataframe = pd.DataFrame(collected_urls)

Expand Down
80 changes: 40 additions & 40 deletions results/runs_tables/finetune_urls.csv
Original file line number Diff line number Diff line change
Expand Up @@ -159,43 +159,43 @@ IQL,antmaze-medium-diverse-v2,tlab/CORL/runs/vaa56ykf
IQL,antmaze-medium-diverse-v2,tlab/CORL/runs/van7r2au
IQL,antmaze-medium-diverse-v2,tlab/CORL/runs/gifi8vh6
IQL,antmaze-medium-diverse-v2,tlab/CORL/runs/8y5gwfhm
Cal-QL,door-cloned-v1,tlab/CORL/runs/futdbwzy
Cal-QL,door-cloned-v1,tlab/CORL/runs/v3fhk9b5
Cal-QL,door-cloned-v1,tlab/CORL/runs/7e60g2i8
Cal-QL,door-cloned-v1,tlab/CORL/runs/mswbah5p
Cal-QL,hammer-cloned-v1,tlab/CORL/runs/pe4wxtsp
Cal-QL,hammer-cloned-v1,tlab/CORL/runs/z6sglkbg
Cal-QL,hammer-cloned-v1,tlab/CORL/runs/7veukzqq
Cal-QL,hammer-cloned-v1,tlab/CORL/runs/9dbm4j72
Cal-QL,pen-cloned-v1,tlab/CORL/runs/aydykrtg
Cal-QL,pen-cloned-v1,tlab/CORL/runs/ftv28qdn
Cal-QL,pen-cloned-v1,tlab/CORL/runs/kn94slpt
Cal-QL,pen-cloned-v1,tlab/CORL/runs/e6e8n3um
Cal-QL,relocate-cloned-v1,tlab/CORL/runs/gyc24rli
Cal-QL,relocate-cloned-v1,tlab/CORL/runs/jauow6em
Cal-QL,relocate-cloned-v1,tlab/CORL/runs/zvwjrdbe
Cal-QL,relocate-cloned-v1,tlab/CORL/runs/and7cw69
Cal-QL,antmaze-umaze-v2,tlab/CORL/runs/29s87v49
Cal-QL,antmaze-umaze-v2,tlab/CORL/runs/5pywzh49
Cal-QL,antmaze-umaze-v2,tlab/CORL/runs/ral60pcj
Cal-QL,antmaze-umaze-v2,tlab/CORL/runs/e71mt108
Cal-QL,antmaze-medium-play-v2,tlab/CORL/runs/6zarya8d
Cal-QL,antmaze-medium-play-v2,tlab/CORL/runs/9o5u46q5
Cal-QL,antmaze-medium-play-v2,tlab/CORL/runs/ul988oc7
Cal-QL,antmaze-medium-play-v2,tlab/CORL/runs/0x2j0ke5
Cal-QL,antmaze-umaze-diverse-v2,tlab/CORL/runs/vj9qwgaa
Cal-QL,antmaze-umaze-diverse-v2,tlab/CORL/runs/jeylkhza
Cal-QL,antmaze-umaze-diverse-v2,tlab/CORL/runs/ebwphwdp
Cal-QL,antmaze-umaze-diverse-v2,tlab/CORL/runs/c6u69n27
Cal-QL,antmaze-large-diverse-v2,tlab/CORL/runs/3urux3jm
Cal-QL,antmaze-large-diverse-v2,tlab/CORL/runs/xhrnoc9v
Cal-QL,antmaze-large-diverse-v2,tlab/CORL/runs/iql5bt5q
Cal-QL,antmaze-large-diverse-v2,tlab/CORL/runs/as4x0ctu
Cal-QL,antmaze-large-play-v2,tlab/CORL/runs/42gnusy5
Cal-QL,antmaze-large-play-v2,tlab/CORL/runs/3acs82zb
Cal-QL,antmaze-large-play-v2,tlab/CORL/runs/toj8wfm6
Cal-QL,antmaze-large-play-v2,tlab/CORL/runs/j11u9vzp
Cal-QL,antmaze-medium-diverse-v2,tlab/CORL/runs/qzl8m5kt
Cal-QL,antmaze-medium-diverse-v2,tlab/CORL/runs/f4epmc88
Cal-QL,antmaze-medium-diverse-v2,tlab/CORL/runs/rzy4fij2
Cal-QL,antmaze-medium-diverse-v2,tlab/CORL/runs/uy2txwue
Cal-QL,door-cloned-v1,tlab/CORL/runs/oi1ig0ri
Cal-QL,door-cloned-v1,tlab/CORL/runs/i069hyd7
Cal-QL,door-cloned-v1,tlab/CORL/runs/rhhdlroq
Cal-QL,door-cloned-v1,tlab/CORL/runs/eicij2jh
Cal-QL,hammer-cloned-v1,tlab/CORL/runs/kusvjf0g
Cal-QL,hammer-cloned-v1,tlab/CORL/runs/1lqi4sg9
Cal-QL,hammer-cloned-v1,tlab/CORL/runs/2fu95t4k
Cal-QL,hammer-cloned-v1,tlab/CORL/runs/7wkikqpn
Cal-QL,pen-cloned-v1,tlab/CORL/runs/csoban2m
Cal-QL,pen-cloned-v1,tlab/CORL/runs/fj45ivs8
Cal-QL,pen-cloned-v1,tlab/CORL/runs/o0y2q02v
Cal-QL,pen-cloned-v1,tlab/CORL/runs/hzq011ab
Cal-QL,relocate-cloned-v1,tlab/CORL/runs/c1csqi8s
Cal-QL,relocate-cloned-v1,tlab/CORL/runs/30r23nbv
Cal-QL,relocate-cloned-v1,tlab/CORL/runs/ywe1cfqa
Cal-QL,relocate-cloned-v1,tlab/CORL/runs/kc7mgqh5
Cal-QL,antmaze-umaze-v2,tlab/CORL/runs/d5f3ul52
Cal-QL,antmaze-umaze-v2,tlab/CORL/runs/fjsryl4k
Cal-QL,antmaze-umaze-v2,tlab/CORL/runs/z781tlua
Cal-QL,antmaze-umaze-v2,tlab/CORL/runs/mbpoixey
Cal-QL,antmaze-medium-play-v2,tlab/CORL/runs/d2gndjad
Cal-QL,antmaze-medium-play-v2,tlab/CORL/runs/kqxyllfa
Cal-QL,antmaze-medium-play-v2,tlab/CORL/runs/qaowm0ds
Cal-QL,antmaze-medium-play-v2,tlab/CORL/runs/ybpehr4w
Cal-QL,antmaze-umaze-diverse-v2,tlab/CORL/runs/xamd4zxj
Cal-QL,antmaze-umaze-diverse-v2,tlab/CORL/runs/a015fjb1
Cal-QL,antmaze-umaze-diverse-v2,tlab/CORL/runs/1pu06s2i
Cal-QL,antmaze-umaze-diverse-v2,tlab/CORL/runs/iwa1o31k
Cal-QL,antmaze-large-diverse-v2,tlab/CORL/runs/yvqv3mxa
Cal-QL,antmaze-large-diverse-v2,tlab/CORL/runs/4myjeu5g
Cal-QL,antmaze-large-diverse-v2,tlab/CORL/runs/6ptdr78l
Cal-QL,antmaze-large-diverse-v2,tlab/CORL/runs/8ix0469p
Cal-QL,antmaze-large-play-v2,tlab/CORL/runs/4chdwkua
Cal-QL,antmaze-large-play-v2,tlab/CORL/runs/fzrlcnwp
Cal-QL,antmaze-large-play-v2,tlab/CORL/runs/f9hz4fal
Cal-QL,antmaze-large-play-v2,tlab/CORL/runs/fpq2ob8q
Cal-QL,antmaze-medium-diverse-v2,tlab/CORL/runs/zhf7tr7p
Cal-QL,antmaze-medium-diverse-v2,tlab/CORL/runs/m02ew5oy
Cal-QL,antmaze-medium-diverse-v2,tlab/CORL/runs/9r1a0trx
Cal-QL,antmaze-medium-diverse-v2,tlab/CORL/runs/ds2dbx2u

0 comments on commit 962688b

Please sign in to comment.