Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge of hdtCat #74

Merged
merged 7 commits into from
Oct 17, 2019
Merged

Merge of hdtCat #74

merged 7 commits into from
Oct 17, 2019

Conversation

D063520
Copy link
Contributor

@D063520 D063520 commented Jul 13, 2018

This branch contains an implementation of hdtCat, an algorithm and command line tool to merge 2 hdt files without decompressing them. This especially allows to merge HDT files and serialize big RDF file to HDT with low memory footprint. On a 16Gb machine we were able to generate an HDT file with 5 billion triples. The code is not working under Windows. This issue is known to us.

@dinikolop
Copy link

Noticed that when two identical files are merged with hdtCat then the triples are duplicated. This leads to a larger output file where hdtSearch returns the double number of results in comparison to the original file.

@D063520
Copy link
Contributor Author

D063520 commented Sep 27, 2018

Hi,
thank you very much for noting this!!!!! and particularly for giving a look to the pull request. I lost a hash and equal function while refactoring the code. Now it works. I also added a new test checking this.

Salut
D063520

@mielvds mielvds changed the title This branch contains an implementation of hdtCat, an algorithm and command line tool to merge 2 hdt files without decompressing them. This especially allows to merge HDT files and serialize big RDF file to HDT with low memory footprint. On a 16Gb machine we were able to generate an HDT file with 5 billion triples. The code is not working under Windows. This issue is known to us. Merge of hdtCat Oct 17, 2019
@mielvds
Copy link
Member

mielvds commented Oct 17, 2019

Guys, love this work. Could you list the major changes to the original code/API? are there any breaking changes?

@D063520
Copy link
Contributor Author

D063520 commented Oct 17, 2019

There are basically no breaking changes, only things beside .... enjoy

@mielvds mielvds merged commit 4e2da84 into rdfhdt:master Oct 17, 2019
@mielvds
Copy link
Member

mielvds commented Oct 17, 2019

alright, cool, merged! Would you mind extending the README to document the new feature a bit?

@D063520
Copy link
Contributor Author

D063520 commented Oct 17, 2019

cool thank you! Somehow there where two README files. I changed that ..... Also the description of hdtCat is in the hdt-java-cli README, that I already updated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants