Given the object’s mesh and different sensory observations of the contact position (visual images, impact sounds, or tactile readings), this task aims to predict the vertex coordinate of the surface location on the mesh where the contact happens.
More formally, the task can be defined as follows: given a visual patch image V (i.e., a visual image near the object’s surface) and/or a tactile reading T and/or an impact sound S, and the shape of the object P (represented by a point cloud), the model needs to localize the contact position C on the point cloud.
The dataset used to train the baseline models can be downloaded from here
Start the training process, and test the best model on test-set after training:
python main.py --batch_size 8 --modality_list vision touch audio \
--model CLR --weight_decay 1e-2 --lr 5e-4 \
--exp CLR_vision_touch_audio
Evaluate the best model in CLR_vision_touch_audio:
python main.py --batch_size 8 --modality_list vision touch audio \
--model CLR --weight_decay 1e-2 --lr 5e-4 \
--exp CLR_vision_touch_audio \
--eval
To train and test your new model on ObjectFolder Contact Localiazation Benchmark, you only need to modify several files in models, you may follow these simple steps.
-
Create new model directory
mkdir models/my_model
-
Design new model
cd models/my_model touch my_model.py
-
Build the new model and its optimizer
Add the following code into models/build.py:
elif args.model == 'my_model': from my_model import my_model model = my_model.my_model(args) optimizer = optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
-
Add the new model into the pipeline
Once the new model is built, it can be trained and evaluated similarly:
python main.py --modality_list vision touch audio \ --model my_model \ --exp my_model
In our experiments, we manually choose 50 objects with rich surface features from the dataset, and sample 1, 000 contacts from each object. The sampling strategy is based on the surface curvature. We assume that the curvature of each vertex is subject to a uniform distribution. The average value of vertex curvatures is computed at first, and the vertices with curvatures that are far from the average value are sampled with higher probability (i.e., the vertices with more special surface patterns are more likely to be sampled).
In the experiments, we randomly split the 1, 000 instances of each object into train/val/test splits of 800/190/10, respectively. Similarly, in the real experiments, we choose 53 objects from ObjectFolder Real and randomly split the instances of each object by 8:1:1.
Method | Vision | Touch | Audio | V+T | V+A | T+A | V+T+A |
RANDOM | 47.32 | 47.32 | 47.32 | 47.32 | 47.32 | 47.32 | 47.32 |
Point Filtering | - | 4.21 | 1.45 | - | - | 3.73 | - |
MCR | 5.03 | 23.59 | 4.85 | 4.84 | 1.76 | 3.89 | 1.84 |
Method | Vision | Touch | Audio | Fusion |
RANDOM | 50.57 | 50.57 | 50.57 | 50.57 |
MCR | 12.30 | 32.03 | 35.62 | 12.00 |