Code for paper Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection
.
pip install -r requirements.txt
Set your openai api key:
export OPENAI_API_KEY=your_api_key
The data/
dir should be organized as follows:
data
├── frames
│ ├── color
│ │ ├── 0.png
│ │ ├── 20.png
│ │ └── ...
├── referit3d
│ ├── annotations
│ ├── scan_data
├── symoblic_exp
│ ├── nr3d.jsonl
│ ├── scanrefer.json
├── test_data
│ ├── above
│ ├── behind
│ ├── ...
├── seg
├── nr3d_masks
├── scanrefer_masks
├── feats_3d.pkl
├── tables.pkl
frames
: RGB images of the scenes. download_linkreferit3d
: processed referit3d dataset from vil3drefsymbolic_exp
: symbolic expressiones.test_data
: test data for code generation.seg
: segmentation results of 3D point clouds for ScanRefer. download_linknr3d_masks
: 2D GT object masks. download_linkscanrefer_masks
: 2D predicted object masks. download_linkfeats_3d.pkl
: predicted object labels for Nr3D from ZSVG3Dtables.pkl
: tables for code generation. download_link
Run src/relation_encoders/run_optim.py
to generate relation encoders for 6 relations:
left
, right
, between
, corner
, above
, below
, behind
.
After the optimization is done, you will get the relation encoders and their accuracy on test cases under data/test_data/{relation_name}/trajs
.
Then you can select the best relation encoders for evaluation.
You can also use the provided relation encoders in src/relation_encoders
.
python -m src.relation_encoders.compute_features \
--dataset scanrefer \
--output $OUTPUT_DIR \
--label pred
--dataset
option can be scanrefer
or nr3d
. The --label
option can be gt
or pred
.
Now we only support the pred
label for ScanRefer because there is no GT label in standard evaluation protocols.
After running, you will get features in .pth
format in the $OUTPUT_DIR
directory.
You can also download our prepared features: nr3d(pred label) nr3d(gt label) scanrefer
Nr3d Evaluation:
python -m src.eval.eval_nr3d \
--features_path output/nr3d_features_per_scene_pred_label.pth \
--top_k 5 \
--threshold 0.9 \
--label_type pred \
--use_vlm
ScanRefer Evaluation:
python -m src.eval.eval_scanrefer \
--features_path output/scanrefer_features_per_scene.pth \
--top_k 5 \
--threshold 0.1 \
--use_vlm
Change features_path
and label_type
if you'd like to evaluate on the ground truth labels.
Set --use_vlm
, --top_k
and threshold to use the VLM model for evaluation.
Please refer to our paper for the meanings of these parameters.
Thank following repositories for their contributions: