-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming parsers #411
base: main
Are you sure you want to change the base?
Streaming parsers #411
Conversation
👍 |
I'd benefit from this feature, and so would (adding up the views) at least 983 + 467 + 5704 + 90 = 7244 other people. I'm going to take a look and see if I can get it to a state where the tests pass and there are no merge conflicts, although I'm not familiar with the codebase and the (existing) code around it is a horrible mess in at least a couple of ways that immediately struck me:
I'll see if I can figure it all out, but no promises. |
Good lord, 470 errors and 154 failures after merging this into today's code. I might give up on this exercise and leave it to somebody who understands both the codebase and RDF itself better than I do, but I'll keep poking a little first... |
step 1 should be to rebase this on the current master - it has changed a bit since I did this work. |
The |
All add/remove methods now raise an Exception if passed a graph. The Graph API works on Graph objects and will pack/unpack as required.
b56c8c7
to
f6e6e01
Compare
@nicholascar this is another thing I would consider for a 6.0.0 release! I am not sure if the work here even sensible as a starting point any more - but making a unified "sink" object across parsers seems like a good idea. |
@gromgull yes I’ve seen this work and agree: unified would be good! I’ve tagged it for 6.0.0 now so it’s on the radar. Might be one of those good architectural tidy-ups once things like ditching the Py2 and perhaps graph IDs parts have been actioned. |
This is a very much incomplete branch for reworking the interface between graphs and parsers.
By introducing a new
Sink
object, and it becomes possible to write streaming RDF processors that process the triples ''as they come in'' and since you do not store them all in a graph you can work on files much larger than what fits in memory.As usual, a pull-request to trigger Travis.