Skip to content

BiPaSs: A pathfinding system with a bidirectional A* search algorithm for finding paths in Wikidata

License

Notifications You must be signed in to change notification settings

uniba-mi/bipass-wikidata-pathfinder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BiPaSs: A Wikidata Pathfinding system

license

RequirementsQuery FactoryDual-Entity Query DatasetPathfinding SystemLicense

This repository contains all materials for reproducing the outcomes described in the research paper Further Investigation of Fast Pathfinding in Wikidata. This comprises the following artifacts:

  • A Query Factory for deriving a dual-entity query dataset for pathfinding in Wikidata
  • The derived dual-entity query dataset
  • A Pathfinding System for finding paths between arbitrary entities in Wikidata

The next paragraphs provide information about each artifact. This includes instructions for reproducing the results mentioned in the paper. Due to continuous updates made to Wikidata, rerunning the optimizer and the benchmark might yield slightly different results. To alleviate this problem, all information retrieved from Wikidata was cached and included in this repository.

Requirements

Only docker and docker-compose are required to run the programs within this repository. All dependencies are automatically installed using the corresponding Dockerfiles. This ensures reproducibility and ease of use. For guidance on how to install Docker click here.

Query Factory

The purpose of the Query Factory is to derive dual-entity queries for pathfinding in Wikidata from the TREC 2007 Million Queries Track dataset. For identifying and disambiguating the entities mentioned in the TREC queries the GENRE entity linker is employed.

Usage

To run the Query Factory proceed as follows:

  1. Select the TREC file from which queries should be derived by adjusting the commented parts in the query_factory.py.
  2. Run docker compose run query_factory from the root directory.
  3. In the new bash run factory 07 to start the query factory. Warning: This will overwrite the already present dual-entity query dataset.

Dual-Entity Query Dataset

The dual-entity query dataset derived using the Query Factory can be found here. It uses the CSV format; the columns have the following meaning:

  • wikidata_id_a: The Wikidata ID of the first entity of the query
  • wikidata_id_b: The Wikidata ID of the second entity of the query
  • trec_id: The ID of the original TREC query

Pathfinding System

This artifact actually comprises three components that implement the pathfinding. The pathfinder component contains the actual pathfinding algorithm and interacts with two API over HTTP: To issue queries on Wikidata, it interacts with the wikidata_api and, to calculate semantic distances between entities, it interacts with the wembed_api.

Usage

To run the Pathfinding System proceed as follows:

  1. Launch the Wikidata API via docker-compose run --service-ports wikidata_api in a separate bash.
  2. Launch the Wembed API via docker-compose run --service-ports wembed_api in a separate bash.
  3. Run docker-compose run pathfinder in a separate bash to launch the Pathfinder component. There are several commands that can be used in this new bash:
    1. Run cargo run -- playground to launch the pathfinder on a few example queries.
    2. Run cargo run -- optimizer to run the optimizer for fitting the search parameters alpha, beta, and gamma. Warning: This will overwrite the already present optimizer results file.
    3. Run cargo run -- benchmark to run the benchmark. Warning: This will overwrite the already present benchmark results files.

To activate the debugging logger level, add the debug flag to one of the commands from 3.1, 3.2, and 3.3. For example cargo run -- playground debug runs the pathfinder with verbose logging.

License

See LICENSE

About

BiPaSs: A pathfinding system with a bidirectional A* search algorithm for finding paths in Wikidata

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published