-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slicing a large histogram is slow #644
Comments
@pfackeldey This is a bit different, vectorizing getitem refers to something else, slices are fine. The second loop will always outperform the first loop above, since it's doing a lot less. The second loop is not even allocating new memory - it's just a view of the memory with different offsets and strides - a new histogram always needs new memory (if it has flow bins, that's logically impossible, without flow bins, it hasn't been worth adding). I think we should do a bit of profiling here to see if anything could be improved (4 seconds does seem a little overboard), and while I've been very reluctant to add too many axes types, maybe we could have a This might be something that would be solved by boostorg/histogram#275, since currently we have to work around this, which might be why it's slow. PS: Originally I started to say |
FYI, running line_profiler:
Shows 100% of the time in this line:
So it looks like this is spending it's time in Boost.Histogram. But, turning on logging, I see
Which means that
Ehh, not much better. In fact, due to the size, this is literally how much time it takes to make a So this can be fixed in this special case for now by skipping the copy if picking is performed, since that will copy again anyway (but a much smaller histogram!):
Much better for this special case - mixing slicing and picking will drop you down to the original slower code. @HDembinski if Boost.Histogram supported picking (ideally fused with other operations), then this would be simpler and faster. Even if it's not fused, "picking" is a much cheaper operation, so ideally should be done first, rather than last. I could eventually look into reversing the order here, but it complicates (or at least requires me to rewrite) the manipulation code. (PS: empty reduce seems a bit slower than it should be, too). |
I don't think I can fix this with implementing picking. The core issue is that the first call does a reduce and allocates memory for another histogram, which is expensive. The second call just creates a view into the original memory, which is very fast. To make this fast I would have to implement histogram views. It could be done with a new storage type, I think, but it is not easy. |
Thanks for creating a new issue! |
Picking reduces the size of the histogram without extra computation, so picking first cuts the size and computation down for the reduce, which is potentially expensive (not sure why it's 3x slower than a copy when nothing is done). If Boost.Histogram had a separate pick, then I'd still have to do the harder part of this reorganization anyway - the pick coming first has to modify the indices of the reduce afterwards. If it was all part of a reduce, then you'd be doing that for me. ;) Histogram views can't be done with flow bins, at least not without storing the flow bins somewhere else. The view method above will always be a bit faster - but that's okay, I'm mostly worried about reducing the 4 second time to under a milisecond, not worried about the final 300 µs to 20 µs from not making a copy. Eventually, we could possibly set up a storage with externally defined memory, but given the flow bin issue, it would not save much / anything in most cases. |
Hi @henryiii @HDembinski ,
I assume the following is related, if not please correct me and I open a fresh new issue...
We noticed in the scope of our analysis that
__getitem__
is a performance "hurdle" for high dimensional histograms (imagine: dataset axis of O(1000) dataset, category axis of O(100) categories and systematic axis of O(100) shifts).I will put a snippet here, which makes the performance visible:
Currently we use the second option, since on a larger analysis scale with multiple of these huge histograms this results in a difference of O(hours) and O(seconds) for histogram manipulation, such as grouping datasets to physics processes. However the first one is (obviously) a lot more convenient to use.
I think this would be a major improvement, especially for the best usability of
hist
andboost_histogram
for large-scale analysis.Best, Peter
Originally posted by @pfackeldey in #149 (comment)
The text was updated successfully, but these errors were encountered: