Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting error when run vae_train.py #17

Open
SejeongPark8354 opened this issue Dec 9, 2020 · 4 comments
Open

Getting error when run vae_train.py #17

SejeongPark8354 opened this issue Dec 9, 2020 · 4 comments

Comments

@SejeongPark8354
Copy link

SejeongPark8354 commented Dec 9, 2020

First of all, Thank you for your great research on molecule generation.
Nowadays, I am training my ZINC datasets with your vae_train.py (in generation folder).
When I run the code, I got the error like below.
This error occur occasionally. I think it depends on the batch.
Is there any solution for this problem?

  warnings.warn(warning.format(ret))
Model #Params: 160850K
[50] Beta: 0.100, KL: 19.11, loss: 57.167, Word: 10.76, 52.60, Topo: 80.77, Assm: 56.73, PNorm: 175.70, GNorm: 18.64
[100] Beta: 0.100, KL: 9.08, loss: 42.075, Word: 14.69, 59.69, Topo: 93.39, Assm: 75.03, PNorm: 236.81, GNorm: 14.60
[150] Beta: 0.100, KL: 9.66, loss: 39.316, Word: 16.71, 62.58, Topo: 96.62, Assm: 77.06, PNorm: 293.82, GNorm: 17.42
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "vae_train.py", line 81, in <module>
    loss, kl_div, wacc, iacc, tacc, sacc = model(*batch, beta=beta)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/hgnn.py", line 88, in forward
    root_vecs, tree_vecs, _, graph_vecs = self.encoder(tree_tensors, graph_tensors)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/encoder.py", line 130, in forward
    hatom,_ = self.graph_encoder(*tensors)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/encoder.py", line 30, in forward
    h = self.rnn(fmess, bgraph)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/rnn.py", line 105, in forward
    h,c = self.LSTM(fmess, h_nei, c_nei)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/rnn.py", line 92, in LSTM
    c = i * u + (f * c_nei).sum(dim=1)
RuntimeError: CUDA error: device-side assert triggered
@jks17
Copy link

jks17 commented Mar 1, 2021

I am also getting the above issue - Did you manage to find a fix @SejeongPark8354 ?

@marshallcase
Copy link

marshallcase commented Aug 31, 2022

getting a very similar issue when running train_generator.py:

Namespace(anneal_iter=25000, anneal_rate=0.9, atom_vocab=<hgraph.vocab.Vocab object at 0x000001C10639ED48>, batch_size=20, clip_norm=5.0, depthG=15, depthT=15, diterG=3, diterT=1, dropout=0.0, embed_size=250, epoch=20, hidden_size=125, kl_anneal_iter=2000, latent_size=32, load_model=None, lr=0.001, max_beta=1.0, print_iter=50, rnn_type='LSTM', save_dir='ckpt/cyclic_truncated_pretrained', save_iter=5000, seed=7, step_beta=0.001, train='train_processed/cyclic_truncated_processed/', vocab='data/chembl/cyclic_peptide_vocab_truncated.txt', warmup=10000)
C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Model #Params: 1318K
  0%|▏                                                                              | 2/1000 [00:32<4:01:50, 14.54s/it]C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [40,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [41,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [42,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [43,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [44,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [45,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [46,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [47,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [48,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [49,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [50,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [51,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [52,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [53,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [54,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [55,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [56,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [57,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [58,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [59,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [44,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [45,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [46,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [47,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [48,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [49,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [50,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [51,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [52,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [53,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [54,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [55,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [56,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [57,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [58,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [59,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [60,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [61,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [62,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [63,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
  0%|▏                                                                              | 2/1000 [00:35<4:56:07, 17.80s/it]
Traceback (most recent call last):
  File "train_generator.py", line 92, in <module>
    loss, kl_div, wacc, iacc, tacc, sacc = model(*batch, beta=beta)
  File "C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\hgnn.py", line 55, in forward
    root_vecs, tree_vecs, _, graph_vecs = self.encoder(tree_tensors, graph_tensors)
  File "C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\encoder.py", line 129, in forward
    tensors = self.embed_graph(graph_tensors)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\encoder.py", line 114, in embed_graph
    fpos = self.E_apos.index_select(index=fmess[:, 3], dim=0)
RuntimeError: CUDA error: device-side assert triggered

@marshallcase
Copy link

marshallcase commented Aug 31, 2022

Actually, I think I figured it out. There's a parameter defined in mol_graph.py , MAX_POS = 20, which limits the E_apos matrix, E_pos matrix, and subsequently when in the enconder, the f_mess matrix will be out of index which is why you get the error.

I think it's an issue of molecule size and graph complexity - in the paper, there's a subscript:
"The number of possible attachments are limited because the number of attaching atoms between two motifs is small and the attaching points must be consecutive.3

3In our experiments, the number of possible attachments are
usually less than 20 for polymers and small molecules."

@Bunnybeibei
Copy link

Bunnybeibei commented Feb 21, 2023

I agree with the above person's advice. I first use "os.environ['CUDA_LAUNCH_BLOCKING'] = '1'" to locate the bug, I find there are some problem with "fpos = self.E_apos.index_select(index=fmess[:, 3], dim=0)". And then I use the slice to locate where the error is,I find the max number of fmess[:,3] is 22 while self.E_apos only has 20 dims. So I increase the MAX_POS in mol_graph.py and solve this problem. I think the operation would not affect the models, maybe waste some memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants