Skip to content

readxml.parse slow on HistFitter workspace #1687

Answered by kratsg
gollumben asked this question in Ideas
Discussion options

You must be logged in to vote

Ok, the problem is partially in uproot and in pyhf. In uproot code (thanks @jpivarski !) there is a damerau_levenshtein function being called when we hit a missing key that takes a long time because the number of keys in this file is very large (https://github.com/scikit-hep/uproot4/blob/85f219a36e76dffc18da4756227a7beb760657a0/src/uproot/_util.py#L810-L858).

In pyhf, when we hit the name of a histogram that is not retrievable without trying the full path first - then it causes a (slow) DeserializationError which is caught by an expected exception in pyhf. We need to change the way we check if a key exists in the file. This is a bug.

Replies: 3 comments 17 replies

Comment options

You must be logged in to vote
7 replies
@alexander-held
Comment options

@lukasheinrich
Comment options

@gollumben
Comment options

@alexander-held
Comment options

@gollumben
Comment options

Comment options

You must be logged in to vote
4 replies
@kratsg
Comment options

@gollumben
Comment options

@kratsg
Comment options

@gollumben
Comment options

Comment options

You must be logged in to vote
6 replies
@kratsg
Comment options

@gollumben
Comment options

@kratsg
Comment options

@gollumben
Comment options

@matthewfeickert
Comment options

Answer selected by kratsg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Ideas
Labels
None yet
5 participants