The code and dataset for paper Toward Practical Entity Alignment Method Design: Insights from New Highly Heterogeneous Knowledge Graph Datasets in The Web Conf 2024.
Python
Pytorch
transformers
SentencePiece
scipy
numpy
pandas
tqdm
networkx
gensim
The model runs in 3 steps:
To get the name embeddings of entities, use:
python process_name_embedding.py --data DATASET
DATASET
can be icews_wiki
, icews_yago
or any dataset you place in the directory data.
We use Fualign to get the embeddings of entities by deepwalk. To get the structure embeddings, use:
cd feature_perprocessing
python preprocess.py --l DATASET
python longterm/main.py \
--input "data/DATASET/deepwalk.data" \
--output "data/DATASET/longterm.vec" \
--node2rel "data/DATASET/node2rel" \
--q 0.7
python get_deep_emb.py --path "data/DATASET/"
DATASET
is the same as the one in Step 1.
To run Simple-HHEA, use:
python main_SimpleHHEA.py \
--data DATASET \
--lr 0.01 \
--wd 0.001 \
--gamma 1.0 \
--epochs 1500
use --add_noise
and --noise_ratio
to control whether to add noise to the name embeddings and how much noise.
use --no_structure
to remove structure embeddings from model.
use--no_time
to remove time embeddings from model.
Or you can use:
bash run_exp.sh
to directly run Simple-HHEA on dataset icews_wiki.
If you interested or inspired by this work, you can cite us by:
@article{jiang2023rethinking,
title={Rethinking GNN-based Entity Alignment on Heterogeneous Knowledge Graphs: New Datasets and A New Method},
author={Jiang, Xuhui and Xu, Chengjin and Shen, Yinghan and Su, Fenglong and Wang, Yuanzhuo and Sun, Fei and Li, Zixuan and Shen, Huawei},
journal={arXiv preprint arXiv:2304.03468},
year={2023}
}