-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to run for the dataset DRKG #21
Comments
Can you give following information: |
Hi DRKG team,
Thanks for your quick response.
The versions are
dgl 0.4.3
dglke 0.1.1
torch 1.6.0
CPU
Best,
Chia-Jung
On Jan 14, 2021, at 10:19 PM, xiang song(charlie.song) <[email protected]<mailto:[email protected]>> wrote:
Can you give following information:
DGL-ke version
DGL version
PyTorch version
CPU or GPU (it seems it is CPU)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#21 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGL3ROK7ECGUVOW2G37VDMTSZ7M67ANCNFSM4WDROREA>.
|
Can you try --num_proc 1? |
Thanks a lot! It works.
The machine has 64 G memory.
On Jan 14, 2021, at 10:29 PM, xiang song(charlie.song) <[email protected]<mailto:[email protected]>> wrote:
Can you try --num_proc 1?
How many memory your CPU machine have?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#21 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGL3ROPSGOKGL2HU4SSOXDDSZ7OFFANCNFSM4WDROREA>.
|
Maybe it is due to OOM problem. You can accordingly increase the --num_thread and also try --num_proc 2 or 4 |
Thanks.
Only --num_proc 1 works and it only takes 2% of memory.
There might be other causes but it is good enough for me now.
Thanks again.
On Jan 14, 2021, at 10:37 PM, xiang song(charlie.song) <[email protected]> wrote:
Maybe it is due to OOM problem. You can accordingly increase the --num_thread and also try --num_proc 2 or 4
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#21 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGL3ROLPROLET4YQ2TZJRW3SZ7PDLANCNFSM4WDROREA>.
|
Hello @classicsong, I tried to use anaconda Jupyter to run Train_embeddings Notebook using CPU. Got error below: If I remove !DGLBACKEND=pytorch, Any advice/idea to fix the issue? |
Hi,
Thanks for all the work. It looks amazing and I am looking forward to integrating my data with other diseases.
It's a pity that I cannot run the code for training DRKG on my machine, which only has CPUs.
The command is
"dglke_train --dataset DRKG --data_path ./train --data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' --model_name TransE_l2 --batch_size 64 --neg_sample_size 256 --hidden_dim 400 --gamma 12.0 --lr 0.1 --max_step 100000 --log_interval 1000 --batch_size_eval 16 -adv --regularization_coef 1.00E-07 --test --num_thread 1 --num_proc 8 --neg_sample_size_eval 10000"
and the output is
Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293713 test triples.
|Train|: 5286834
random partition 5286834 edges into 8 parts
part 0 has 660855 edges
part 1 has 660855 edges
part 2 has 660855 edges
part 3 has 660855 edges
part 4 has 660855 edges
part 5 has 660855 edges
part 6 has 660855 edges
part 7 has 660849 edges
/opt/conda/lib/python3.7/site-packages/dgl/base.py:25: UserWarning: multigraph will be deprecated.DGL will treat all graphs as multigraph in the future.
warnings.warn(msg, warn_type)
|valid|: 293713
|test|: 293713
Bus error (core dumped)
The command works fine for other data.
"dglke_train --model_name TransE_l2 --dataset FB15k --batch_size 1000 --neg_sample_size 200 --hidden_dim 400 --gamma 19.9 --lr 0.25 --max_step 3000 --log_interval 100 --batch_size_eval 16 --test -adv --regularization_coef 1.00E-09 --num_thread 1 --num_proc 8" worked successfully.
Thanks again for your sharing.
The text was updated successfully, but these errors were encountered: