-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute and use statistics on the whole dataset #231
Comments
There was talk of using Blocks Aggregators for this. it's a good idea, though we should think about how to store this @rizar thoughts? On Tue, Sep 15, 2015 at 3:05 PM, Francesco [email protected] wrote:
|
I am not sure I understand what kind of support from Fuel you guys want. Yes, statistics can be computed by iterating dataset. Indeed, On the other hand, I find it not very nice that I can not use Fuel to create a whitened version of my dataset. |
This is what I would like from Fuel:
|
@fvisin I think we should split this ticket up. We should try and tackle the offline preprocessor first, as it is potentially useful without the other two pieces. My next priority would be accompanying transformers that can use these statistics to do things like #242 (your number 3), so at least the user can store this stuff manually while we figure out 2. We should design the API such that it can be used as an optional step at dataset conversion time, so you can say something like "also, store a whitening matrix for this |
@dwf I agree on everything. The transformers and the offline preprocessor can indeed be tackled independently, provided that we design them keeping in mind that we want them to work together at some point. Having transformers that use global statistics could be very handy even without the offline preprocessor, as it is quite common to compute and store global statistics and I would expect people to already have them somewhere for a bunch of datasets, e.g., I pickled the pixel mean and variance of Imagenet when I worked on it, I could plug them into Fuel easily with the Transformer. |
At my best knowledge it is not currently possible to compute statistics on the whole dataset, e.g., per-pixel mean and standard deviation. This wouldn't fit well in the
Transformer
class as it is not an on-the-fly transformation of the data, so I think it should be a separate class.The best would be to be able to apply this class either on a
Stream
coming from aDataset
, or from one coming from aTransformer
(I hope my terminology is sound here, correct me if I am wrong), as in some cases preprocessing of the data is required before it is possible to collect statistics.This issue naturally pairs with #230 , as I can imagine one would want to save the result of this computation in a (potentially new) dataset. In this case, it might be convenient to define how these statistics should be saved in an h5 file, to enforce consistency among the datasets.
The text was updated successfully, but these errors were encountered: