Skip to content

serre-lab/human_clickme_data_processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ClickMe Heatmap Generator

This project processes ClickMe data for CO3D images, generating heatmaps and analyzing click statistics. It provides tools to compute various correlation metrics, including AUC, cross-entropy, Spearman, and RSA, to evaluate the quality of generated heatmaps.

Download additional files from here

Table of Contents

Features

  • Data Processing: Processes ClickMe CSV or NPZ data to generate clickmaps.
  • Heatmap Generation: Creates heatmaps from click data with optional Gaussian blurring.
  • Correlation Analysis: Calculates split-half correlations and null correlations using various metrics.
  • Visualization: Provides tools for debugging and analysis of clickmaps and heatmaps.
  • Configuration Driven: Easily configurable through YAML configuration files.

Installation

Follow these steps to set up the project:

  1. Clone the Repository

    git clone https://github.com/yourusername/clickme-heatmap-generator.git
    cd clickme-heatmap-generator
  2. Create a Virtual Environment

    python3 -m venv venv
    source venv/bin/activate
  3. Install Dependencies

    pip install -r requirements.txt
    python setup.py build_ext --inplace

Usage

Configuration

Before running the scripts, ensure that the configuration file is properly set up. The project uses YAML configuration files located in the configs/ directory. Below is an example of configs/co3d_config.yaml:

experiment_name: co3d
clickme_data: clickme_vCO3D.csv
image_dir: CO3D_ClickMe2
preprocess_db_data: False
blur_size: 21
min_clicks: 10  # Minimum number of clicks for a map to be included
max_clicks: 75  # Maximum number of clicks for a map to be included
min_subjects: 10  # Minimum number of subjects for an image to be included
null_iterations: 10
metric: auc  # Options: AUC, crossentropy, spearman, RSA
image_shape: [256, 256]
center_crop: [224, 224]
display_image_keys:
  - mouse/372_41138_81919_renders_00017.png
  - skateboard/55_3249_9602_renders_00041.png
  - couch/617_99940_198836_renders_00040.png
  - microwave/482_69090_134714_renders_00033.png
  - bottle/601_92782_185443_renders_00030.png
  - kite/399_51022_100078_renders_00049.png
  - carrot/405_54110_105495_renders_00039.png
  - banana/49_2817_7682_renders_00025.png
  - parkingmeter/429_60366_116962_renders_00032.png

Key Configuration Parameters:

  • experiment_name: Name of the experiment.
  • clickme_data: Path to the ClickMe data file (CSV or NPZ).
  • image_dir: Directory containing the images.
  • preprocess_db_data: Boolean indicating whether to preprocess database data.
  • blur_size: Size of the Gaussian blur kernel.
  • min_clicks: Minimum number of clicks required for a map to be included.
  • max_clicks: Maximum number of clicks allowed for a map to be included.
  • min_subjects: Minimum number of subjects required for an image to be included.
  • null_iterations: Number of iterations for null correlation computations.
  • metric: Correlation metric to use (auc, crossentropy, spearman, RSA).
  • image_shape: Shape of the images (height, width).
  • center_crop: Size for center cropping the images.
  • display_image_keys: List of image keys to visualize.

Running the Scripts

The repository provides several scripts for processing and analyzing ClickMe data. Below are the primary scripts along with examples using the updated configuration interface.

Compute Human Ceiling Split Half

This script performs split-half correlation analysis on the clickmaps.

Script: compute_human_ceiling_split_half.py

Usage:

python compute_human_ceiling_split_half.py --config configs/co3d_config.yaml

Description:

  • Processes ClickMe data to generate clickmaps.
  • Applies Gaussian blur and optional center cropping.
  • Computes split-half correlations using the specified metric.
  • Optionally visualizes the clickmaps for debugging purposes.
  • Saves the results to human_ceiling_results.npz.

Compute Human Ceiling Hold One Out

This script performs a hold-one-out correlation analysis on the clickmaps using parallel processing to speed up computations.

Script: compute_human_ceiling_hold_one_out.py

Usage:

python compute_human_ceiling_hold_one_out.py --config configs/co3d_config.yaml

Description:

  • Similar to the split-half script but uses a hold-one-out approach.
  • Utilizes parallel processing with Joblib to compute correlations efficiently.
  • Computes both split-half and null correlations using the specified metric.
  • Saves the results to human_ceiling_results.npz.

Note: Adjust the null_iterations parameter in the configuration file to control the number of null correlation computations.

Prepare Clickmaps for Modeling

This script prepares the clickmaps for modeling by processing the data and saving the processed clickmaps and medians.

Script: clickme_prepare_maps_for_modeling.py

Usage:

python clickme_prepare_maps_for_modeling.py --config configs/co3d_config.yaml

Description:

  • Processes ClickMe data to generate and prepare clickmaps.
  • Applies Gaussian blur and center cropping as specified in the configuration.
  • Saves the prepared clickmaps and median statistics to the assets/ directory.

Visualize Clickmaps

This script visualizes the processed clickmaps and their corresponding images.

Script: visualize_clickmaps.py

Usage:

python visualize_clickmaps.py --config configs/co3d_config.yaml

Description:

  • Loads the processed clickmaps.
  • Visualizes the heatmaps alongside the original images.
  • Saves the visualization images to the specified output directory.

Metrics

The project supports the following correlation metrics to evaluate the quality of the generated heatmaps:

  • AUC (Area Under the Curve): Measures the ability of the heatmap to discriminate between relevant and non-relevant regions.
  • Cross-Entropy: Evaluates the difference between the predicted heatmap distribution and the target distribution.
  • Spearman: Computes the Spearman rank-order correlation coefficient between two heatmaps.
  • RSA (Representational Similarity Analysis): Assesses the similarity between two representations by comparing their correlation matrices.

You can specify the desired metric in the configuration file under the metric key.

Project Structure

├── compute_human_ceiling_hold_one_out.py
├── compute_human_ceiling_split_half.py
├── configs/
│   ├── co3d_config.yaml
│   └── jay_imagenet_0.1_config.yaml
├── utils.py
├── clickme_prepare_maps_for_modeling.py
├── visualize_clickmaps.py
├── requirements.txt
├── .gitignore
└── README.md
  • compute_human_ceiling_split_half.py: Script for split-half correlation analysis.
  • compute_human_ceiling_hold_one_out.py: Script for hold-one-out correlation analysis with parallel processing.
  • clickme_prepare_maps_for_modeling.py: Script to prepare clickmaps for modeling.
  • visualize_clickmaps.py: Script to visualize clickmaps alongside images.
  • configs/co3d_config.yaml: Configuration file for the CO3D dataset.
  • configs/jay_imagenet_0.1_config.yaml: Configuration file for the ImageNet dataset.
  • utils.py: Utility functions for processing data, generating heatmaps, and computing metrics.
  • requirements.txt: List of Python dependencies.
  • .gitignore: Specifies files and directories to be ignored by Git.
  • README.md: Project documentation.

Dependencies

The project relies on the following Python packages:

  • numpy
  • pandas
  • Pillow
  • scipy
  • torch
  • matplotlib
  • tqdm
  • joblib
  • torchvision
  • PyYAML

Install all dependencies using the provided requirements.txt:

pip install -r requirements.txt

Contributing

Contributions are welcome! Please follow these steps to contribute:

  1. Fork the Repository

  2. Create a Feature Branch

    git checkout -b feature/YourFeature
  3. Commit Your Changes

    git commit -m "Add your feature"
  4. Push to the Branch

    git push origin feature/YourFeature
  5. Open a Pull Request

    Provide a clear description of your changes and submit the pull request for review.

License

This project is licensed under the MIT License.


Feel free to reach out if you have any questions or need further assistance!

Human Clickme Data Processing

GPU-Accelerated Blurring Feature

A significant performance optimization has been implemented to make the blurring process faster by leveraging GPU acceleration. The script now:

  1. Pre-processes clickmaps in parallel on CPU with joblib
  2. Runs blurring in batches on GPU for maximum performance
  3. Post-processes results in parallel on CPU with joblib

How to Use GPU Acceleration

The GPU acceleration is enabled by default. You can control it with the following config parameters:

  • use_gpu_blurring: Boolean to enable/disable GPU acceleration (default: true)
  • gpu_batch_size: Number of images to process in each GPU batch (default: 32)

Example config:

{
  "experiment_name": "my_experiment",
  "use_gpu_blurring": true,
  "gpu_batch_size": 64,
  "other_params": "..."
}

Performance Considerations

  • The optimal batch size depends on your GPU memory. Larger batches generally provide better performance but require more memory.
  • You may need to adjust the batch size based on your image dimensions and GPU memory.
  • If you experience out-of-memory errors, try reducing the batch size.

Implementation Details

The implementation splits the work into three phases:

  1. CPU-parallel pre-processing: Creates binary clickmaps from click coordinates
  2. GPU batch processing: Applies blurring to multiple clickmaps simultaneously
  3. CPU-parallel post-processing: Filters and processes the blurred maps

This approach significantly reduces processing time compared to the previous implementation where blurring was done sequentially.

GPU Acceleration and Batch Size Configuration

The processing pipeline has been optimized for GPU acceleration with large batch processing. By default, the batch size is set to 1024 for both GPU operations and correlation computations.

Batch Size Settings

You can control batch sizes through the configuration file:

# GPU and parallelization settings
gpu_batch_size: 1024        # Batch size for GPU blurring operations
correlation_batch_size: 1024 # Batch size for correlation computations
n_jobs: -1                  # Number of CPU jobs (-1 for all cores)

If you experience out-of-memory errors on your GPU, try reducing the batch sizes.

Updating Existing Configs

To update all your existing config files with these batch size settings, run:

python update_configs_with_batch_size.py

This will add gpu_batch_size: 1024 and correlation_batch_size: 1024 to all your YAML configuration files.

Template Config

A template config file with these settings is available at configs/batch_size_template.yaml.

ClickMe Data Processing - Optimized for Large Datasets

This repository contains optimized tools for processing the ClickMe dataset, particularly when dealing with very large datasets (6M+ trials).

Optimizations

The code has been optimized with several performance improvements:

  1. Cython-accelerated functions - Core computational functions rewritten in Cython for significant speed improvements
  2. HDF5 storage - Optional HDF5 file format for storing clickmaps, much faster than individual numpy files
  3. Parallel processing - Multi-core processing for CPU-bound tasks
  4. GPU acceleration - CUDA-optimized blurring and convolution operations
  5. Memory management - Improved chunk processing to minimize memory usage
  6. Optimized algorithms - Faster implementations of common operations like duplicate detection

Installation

Before running the optimized code, you need to compile the Cython extensions:

# Install required dependencies
pip install cython numpy scipy h5py

# Compile Cython extensions
python setup.py build_ext --inplace

Usage

The script can be run with a config file:

python clickme_prepare_maps_for_modeling.py configs/your_config.yaml

Configuration Options

The following configuration options are available to control the optimization:

# Core settings (existing)
clickme_data: "path/to/clickme_data.csv"
filter_mobile: true
assets: "assets"
example_image_output_dir: "examples"
blur_size: 15
image_shape: [224, 224]
min_clicks: 4
max_clicks: 100
min_subjects: 3
percentile_thresh: 50
experiment_name: "experiment1"
processed_medians: "processed_medians.json"

# New optimization settings
output_format: "hdf5"  # "hdf5", "numpy", or "auto" (auto selects based on dataset size)
use_cython: true  # Use Cython-optimized functions (much faster)
chunk_size: 100000  # Number of unique images per chunk
parallel_clickmap_processing: true  # Use parallel processing for clickmap file processing
parallel_prepare_maps: true  # Use parallel processing for map preparation
use_gpu_blurring: true  # Use GPU for blurring operations
gpu_batch_size: 32  # Batch size for GPU operations
n_jobs: -1  # Number of CPU jobs (-1 = all cores)

Performance Tips

For extremely large datasets (6M+ trials):

  1. Use HDF5 format - Set output_format: "hdf5" to avoid slow file I/O with individual numpy files
  2. Enable Cython - Keep use_cython: true for maximum performance
  3. Adjust chunk size - If you encounter memory issues, reduce chunk_size to a smaller value
  4. Monitor GPU memory - If running out of GPU memory, reduce gpu_batch_size
  5. Use compression - HDF5 compression is enabled by default, but you can adjust compression level in the code

Example Config

Here's an example config for processing a very large dataset:

# Data settings
clickme_data: "data/clickme_6m_trials.csv"
filter_mobile: true
assets: "assets/large_dataset"
example_image_output_dir: "examples/large_dataset"
blur_size: 15
image_shape: [224, 224]
min_clicks: 4
max_clicks: 100
min_subjects: 3
percentile_thresh: 50
experiment_name: "clickme_large"
processed_medians: "clickme_large_medians.json"

# Optimization settings for large dataset
output_format: "hdf5"
use_cython: true
chunk_size: 50000  # Reduced chunk size for memory management
parallel_clickmap_processing: true
parallel_prepare_maps: true
use_gpu_blurring: true
gpu_batch_size: 16  # Smaller batch size to avoid GPU OOM
n_jobs: -1

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •