Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing BaseBatcher + Input Data Preparation for Model #10

Open
omidiu opened this issue Dec 2, 2024 · 0 comments
Open

Missing BaseBatcher + Input Data Preparation for Model #10

omidiu opened this issue Dec 2, 2024 · 0 comments

Comments

@omidiu
Copy link

omidiu commented Dec 2, 2024

Thanks for Sharing Your Code!

Hi, first of all, thank you for making the code of your paper publicly available! 🙌 I'm currently using it as part of my research and have encountered some issues/questions that I hope you can help clarify.

In the file src/batcher/make.py at line 3, there is the following import:

from src.batcher.base import BaseBatcher

However, it seems that BaseBatcher is not available in src/batcher/base.py.

I'm also trying to figure out how to feed data into the model and, in an abstract sense, understand the expected structure and shape of the input data. At this stage, my focus is on the input requirements rather than the specific nature of the data (e.g., whether it's EEG or not). Here's my current code:

from src.train_gpt import make_model, get_config

config = get_config()
config['num_hidden_layers'] = 6
model = make_model(config)
model.from_pretrained("model/pytorch_model.bin")

sample = torch.rand((1, 1, 22, 1080), dtype=torch.float32)  # Adjusted shape

input_dict = {
    'inputs': sample,
    'attention_mask': torch.ones((sample.size(0), sample.size(1)), dtype=torch.long)
}

model(input_dict)

Unfortunately, I encountered this error:

RuntimeError: The size of tensor a (2640) must match the size of tensor b (1080) at non-singleton dimension 2

The error seems to originate from the following snippet in embedder/csm_causal.py:

batch['inputs_embeds'] = torch.where(
    batch['modelling_mask'] == 1,
    self.msk_embed.repeat(
        input_shape[0],
        input_shape[1],
        1
    ),
    batch[inputs_key].to(torch.float)
)

Could you please clarify:

How should I properly prepare and format the input data for the model?
For example, what should the structure and shape of the input look like?
(I tried using batcher, but there is no BaseBatcher)

Please consider that I'm confident using PyTorch, but I'm a beginner with EEG data. If there are any resources you recommend, I'd greatly appreciate it.
Also, I have read the discussion in issue #3, but I still need clarification on these points. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant