Multiple Instance Learning (MIL) methods are mainstream approaches for pathological image classification and analysis. The CAMELYON-16/17 datasets are commonly used to evaluate MIL methods. However, they have the following issues:
- CAMELYON-16/17 datasets contain some problematic slides
- Pixel-annotations of CAMELYON-16/17 test-dataset not accurate enough
- Different MIL methods do not have a unified dataset-split and evaluation-metrics on the CAMELYON dataset
- To conclude,there is no BENCHMARK for MIL methods
We do the following work to establish a CAMELYON+ BENCHMARK
- Remove some problematic slides.
- Correct problematic annotations.
- Merge the correct version of**CAMELYON-16/17** datasets as the CAMELYON+ dataset.
- Evaluate mainstream MIL methods on the CAMELYON-NEW dataset.
- Evaluate mainstream feature extractors on the CAMELYON-NEW dataset.
- Use more comprehensive evaluation metrics to assess different methods.
- In summary, we establish a new CAMELYON+ BENCHMARK.