Getting error when run vae_train.py #17

SejeongPark8354 · 2020-12-09T10:50:28Z

First of all, Thank you for your great research on molecule generation.
Nowadays, I am training my ZINC datasets with your vae_train.py (in generation folder).
When I run the code, I got the error like below.
This error occur occasionally. I think it depends on the batch.
Is there any solution for this problem?

  warnings.warn(warning.format(ret))
Model #Params: 160850K
[50] Beta: 0.100, KL: 19.11, loss: 57.167, Word: 10.76, 52.60, Topo: 80.77, Assm: 56.73, PNorm: 175.70, GNorm: 18.64
[100] Beta: 0.100, KL: 9.08, loss: 42.075, Word: 14.69, 59.69, Topo: 93.39, Assm: 75.03, PNorm: 236.81, GNorm: 14.60
[150] Beta: 0.100, KL: 9.66, loss: 39.316, Word: 16.71, 62.58, Topo: 96.62, Assm: 77.06, PNorm: 293.82, GNorm: 17.42
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "vae_train.py", line 81, in <module>
    loss, kl_div, wacc, iacc, tacc, sacc = model(*batch, beta=beta)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/hgnn.py", line 88, in forward
    root_vecs, tree_vecs, _, graph_vecs = self.encoder(tree_tensors, graph_tensors)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/encoder.py", line 130, in forward
    hatom,_ = self.graph_encoder(*tensors)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/encoder.py", line 30, in forward
    h = self.rnn(fmess, bgraph)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/rnn.py", line 105, in forward
    h,c = self.LSTM(fmess, h_nei, c_nei)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/rnn.py", line 92, in LSTM
    c = i * u + (f * c_nei).sum(dim=1)
RuntimeError: CUDA error: device-side assert triggered

The text was updated successfully, but these errors were encountered:

jks17 · 2021-03-01T22:20:17Z

I am also getting the above issue - Did you manage to find a fix @SejeongPark8354 ?

marshallcase · 2022-08-31T21:16:34Z

getting a very similar issue when running train_generator.py:

Namespace(anneal_iter=25000, anneal_rate=0.9, atom_vocab=<hgraph.vocab.Vocab object at 0x000001C10639ED48>, batch_size=20, clip_norm=5.0, depthG=15, depthT=15, diterG=3, diterT=1, dropout=0.0, embed_size=250, epoch=20, hidden_size=125, kl_anneal_iter=2000, latent_size=32, load_model=None, lr=0.001, max_beta=1.0, print_iter=50, rnn_type='LSTM', save_dir='ckpt/cyclic_truncated_pretrained', save_iter=5000, seed=7, step_beta=0.001, train='train_processed/cyclic_truncated_processed/', vocab='data/chembl/cyclic_peptide_vocab_truncated.txt', warmup=10000)
C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Model #Params: 1318K
  0%|▏                                                                              | 2/1000 [00:32<4:01:50, 14.54s/it]C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [40,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [41,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [42,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [43,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [44,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [45,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [46,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [47,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [48,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [49,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [50,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [51,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [52,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [53,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [54,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [55,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [56,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [57,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [58,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [59,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [44,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [45,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [46,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [47,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [48,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [49,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [50,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [51,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [52,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [53,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [54,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [55,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [56,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [57,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [58,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [59,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [60,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [61,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [62,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [63,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
  0%|▏                                                                              | 2/1000 [00:35<4:56:07, 17.80s/it]
Traceback (most recent call last):
  File "train_generator.py", line 92, in <module>
    loss, kl_div, wacc, iacc, tacc, sacc = model(*batch, beta=beta)
  File "C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\hgnn.py", line 55, in forward
    root_vecs, tree_vecs, _, graph_vecs = self.encoder(tree_tensors, graph_tensors)
  File "C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\encoder.py", line 129, in forward
    tensors = self.embed_graph(graph_tensors)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\encoder.py", line 114, in embed_graph
    fpos = self.E_apos.index_select(index=fmess[:, 3], dim=0)
RuntimeError: CUDA error: device-side assert triggered

marshallcase · 2022-08-31T22:19:30Z

Actually, I think I figured it out. There's a parameter defined in mol_graph.py , MAX_POS = 20, which limits the E_apos matrix, E_pos matrix, and subsequently when in the enconder, the f_mess matrix will be out of index which is why you get the error.

I think it's an issue of molecule size and graph complexity - in the paper, there's a subscript:
"The number of possible attachments are limited because the number of attaching atoms between two motifs is small and the attaching points must be consecutive.3

3In our experiments, the number of possible attachments are
usually less than 20 for polymers and small molecules."

Bunnybeibei · 2023-02-21T01:51:03Z

I agree with the above person's advice. I first use "os.environ['CUDA_LAUNCH_BLOCKING'] = '1'" to locate the bug, I find there are some problem with "fpos = self.E_apos.index_select(index=fmess[:, 3], dim=0)". And then I use the slice to locate where the error is，I find the max number of fmess[:,3] is 22 while self.E_apos only has 20 dims. So I increase the MAX_POS in mol_graph.py and solve this problem. I think the operation would not affect the models, maybe waste some memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting error when run vae_train.py #17

Getting error when run vae_train.py #17

SejeongPark8354 commented Dec 9, 2020 •

edited

Loading

jks17 commented Mar 1, 2021

marshallcase commented Aug 31, 2022 •

edited

Loading

marshallcase commented Aug 31, 2022 •

edited

Loading

Bunnybeibei commented Feb 21, 2023 •

edited

Loading

Getting error when run vae_train.py #17

Getting error when run vae_train.py #17

Comments

SejeongPark8354 commented Dec 9, 2020 • edited Loading

jks17 commented Mar 1, 2021

marshallcase commented Aug 31, 2022 • edited Loading

marshallcase commented Aug 31, 2022 • edited Loading

Bunnybeibei commented Feb 21, 2023 • edited Loading

SejeongPark8354 commented Dec 9, 2020 •

edited

Loading

marshallcase commented Aug 31, 2022 •

edited

Loading

marshallcase commented Aug 31, 2022 •

edited

Loading

Bunnybeibei commented Feb 21, 2023 •

edited

Loading