Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDTCatTree to create an HDT with low resources using HDTCat #172

Closed
wants to merge 1 commit into from

Conversation

ate47
Copy link
Contributor

@ate47 ate47 commented Sep 16, 2022

This pull request create a new small method to generate HDT, catTree.

This method create small HDTs using the generateHDT method and HDTCat them to reduce memory usage or being able to create HDTs without having the memory to store it.

API Changes

It add 3 new methods in HDTManager and 3 to implement

public static HDT catTree(RDFFluxStop fluxStop, HDTSupplier supplier, String rdfFileName, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
public static HDT catTree(RDFFluxStop fluxStop, HDTSupplier supplier, InputStream rdfStream, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
public static HDT catTree(RDFFluxStop fluxStop, HDTSupplier supplier, Iterator<TripleString> iterator, String baseURI, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;

protected abstract HDT doHDTCatTree(RDFFluxStop fluxStop, HDTSupplier supplier, String filename, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
protected abstract HDT doHDTCatTree(RDFFluxStop fluxStop, HDTSupplier supplier, InputStream stream, String baseURI, RDFNotation rdfNotation, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;
protected abstract HDT doHDTCatTree(RDFFluxStop fluxStop, HDTSupplier supplier, Iterator<TripleString> iterator, String baseURI, HDTOptions hdtFormat, ProgressListener listener) throws IOException, ParserException;

It also 2 new classes to specify how to build the HDT with HDTSupplier and when to stop the RDF stream with RDFFluxStop.

Both HDTSupplier and RDFFluxStop have methods to quickly create instances.

static RDFFluxStop noLimit();
static RDFFluxStop countLimit(long maxTriple);
static RDFFluxStop sizeLimit(long maxSize);
static HDTSupplier memory();

It's also possible to use multiple limit with the

RDFFluxStop and(RDFFluxStop other);
RDFFluxStop or(RDFFluxStop other);

methods.

Core changes

  • Implementation of HDTCatTree with tests.

  • Some fixes on the header part with HDTCat.

  • Remove of the System.out.println during HDTCat to use the ProgressListener

API Changes

Add of the -cattreelocation and -cattree options to the rdf2hdt command to use HDTCatTree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant