Skip to content

[Re] Network Deconvolution #89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rochanaro opened this issue Sep 19, 2024 · 40 comments
Closed

[Re] Network Deconvolution #89

rochanaro opened this issue Sep 19, 2024 · 40 comments

Comments

@rochanaro
Copy link

Original article: C. Ye, M. Evanusa, H. He, A. Mitrokhin, T. Goldstein, J. A. Yorke, C. Fermüller, and Y. Aloimonos. “Network
Deconvolution.” In: ICLR (2020).

PDF URL: https://github.com/lamps-lab/rep-network-deconvolution/blob/master/article.pdf
Metadata URL: https://github.com/lamps-lab/rep-network-deconvolution/blob/master/metadata.yaml
Code URL: https://github.com/lamps-lab/rep-network-deconvolution

Scientific domain: Machine Learning
Programming language: Python
Suggested editor:

@rougier
Copy link
Member

rougier commented Oct 14, 2024

Thanks for your submission. We'll assign an editor soon.

@rougier
Copy link
Member

rougier commented Oct 14, 2024

By the way is this submision part of the ICLR reproducibility challenge? If yes, are there any open review somewhere?

@rochanaro
Copy link
Author

Thanks for your submission. We'll assign an editor soon.

Thank You!

@rochanaro
Copy link
Author

By the way is this submision part of the ICLR reproducibility challenge? If yes, are there any open review somewhere?

No, our work was not submitted to ICLR reproducibility challenge.

@rougier
Copy link
Member

rougier commented Jan 21, 2025

Very sorry for such a long delay, hopefully things will get better for 2025.
I was asking the question because the format of the PDF is very similar to the ICLR challenge. Note that this is not a problem at all, the idea was to re-use review if they were available.

I'll edit your review and assign reviewers soon hopefully.

In the meantime, can you have a look at other submissions and propose yourself to review?

@rougier rougier self-assigned this Jan 21, 2025
@rougier
Copy link
Member

rougier commented Jan 21, 2025

@birdortyedi @MiWeiss Coud lyou review this submission?

@rochanaro
Copy link
Author

Very sorry for such a long delay, hopefully things will get better for 2025. I was asking the question because the format of the PDF is very similar to the ICLR challenge. Note that this is not a problem at all, the idea was to re-use review if they were available.

I'll edit your review and assign reviewers soon hopefully.

In the meantime, can you have a look at other submissions and propose yourself to review?

Thank you for the update! We understand how busy things can get.
Yes, for organizing the content, we tried to follow the structure of MLRC 2022 template only to ensure clarity.
I’ll certainly take a look at other submissions and will propose myself as a reviewer where I can contribute.

@MiWeiss
Copy link

MiWeiss commented Jan 23, 2025

Sounds like an interesting paper and a good match for me. Unfortunately, though, I won't be able to review a paper in the next months.

@rougier

@rougier
Copy link
Member

rougier commented Feb 17, 2025

@ReScience/reviewers We're looking for two reviewers (Machine learnong / Python ICLR), any takers?

@rougier
Copy link
Member

rougier commented Feb 17, 2025

@rochanaro Don't hesitate to post here to ask for update

@alchemi5t
Copy link

@rougier I would like to review this one.

@rougier
Copy link
Member

rougier commented Feb 17, 2025

@alchemi5t Many thanks! You can start the review now and let's target mid-March (or sooner if you can)

@jsta
Copy link
Member

jsta commented Feb 18, 2025

@rougier I am interested in reviewing this one if a second review is needed.

@rougier
Copy link
Member

rougier commented Feb 19, 2025

@jsta That would be great, many thanks. Do you think mid-March would work for you?

@jsta
Copy link
Member

jsta commented Feb 19, 2025

Yes

@jsta
Copy link
Member

jsta commented Feb 26, 2025

@rougier I'm not getting responses to PR's and Issues in the code repo. Can we delay the due date pending a response from @rochanaro or team?

@rochanaro
Copy link
Author

@rougier I'm not getting responses to PR's and Issues in the code repo. Can we delay the due date pending a response from @rochanaro or team?

@jsta Apologies for the delayed response. We encountered an issue with notification delivery for pull requests and issues. However, we have now addressed all queries received thus far.

@rougier
Copy link
Member

rougier commented Mar 12, 2025

@alchemi5t @jsta Any progress on your reviews?

@alchemi5t
Copy link

@rougier I'll have my review by the end of the week.

@jsta
Copy link
Member

jsta commented Mar 12, 2025

@rougier Yes, I have some general comments prepared. I will post them once I have finished a couple tests to verify general behavior. I do not plan to run all the tests due to computational constraints.

@alchemi5t
Copy link

Dear Authors,

Here's my review of your work.

Review of "[Re] Network Deconvolution"

This reproduction study provides a thorough evaluation of the network deconvolution technique introduced by Ye et al. (2020). After examining both the original paper and this reproduction, I find that the authors have validated the primary claim about network deconvolution for most parts. While there are many cases where network deconvolution improves model performance compared to batch normalization, the reproduction results show few exceptions that contradict the original paper's universal claim. For example, in the ResNet-18 architecture with CIFAR-100 at 100 epochs, batch normalization (97.42%) actually outperformed network deconvolution (94.31%), which contradicts both the original paper's results and its central claim.

Strengths

  • Comprehensive testing: The authors tested 10 modern neural network architectures on CIFAR-10/100 and ImageNet datasets, over 3 runs
  • Training time analysis: The authors went beyond the original paper by analyzing computational overhead, showing that network deconvolution requires more training time

Key Observations

The few exceptions where BN seems to have outperformed has not been noted and there is no analysis around it. Furthermore, this statement is not true given the results:
"The results show that the model performance with network deconvolution is always better than the model performance with batch normalization."

Another notable finding is that the reproduced accuracy values were often significantly higher than those reported in the original paper. For example:

  • For VGG-16 with CIFAR-100 at 100 epochs, the original paper reported 75.32% accuracy with network deconvolution, while the reproduction achieved 99.30%
  • Similar large improvements were observed across most architectures and datasets

The authors attribute this systematic improvement to:

  1. Advancements in numerical stability of libraries (NumPy 1.16.1 → 1.23.5, PyTorch 1.0 → 1.13)
  2. Improved optimization algorithms and parallelism in Tensorflow

While these explanations are plausible, the magnitude of improvement (sometimes exceeding 20%) suggests there might be additional factors at play that weren't fully investigated. Given that this improvement has increased most accuracy (both ND and BN) to 99.xx where the comparison comes down to 10e-2 (vs the 1-2 pts in the original paper), I expected deeper analysis and stronger evidence for these claims.

Code

As far as I understand, running training runs seems to be the only way to test the codebase. Due to lack of hardware I am unable to do so, but to anyone who can and would like to quickly verify the claims, releasing the trained weights at 100 epochs and a minimal script to infer the results would be greatly helpful.

Opinion

This reproduction study provides valuable insights but reveals important discrepancies with the original paper's claims. While network deconvolution often outperforms batch normalization, the reproduction found notable exceptions.

The reproduction yielded substantially higher accuracy values for both techniques compared to those reported in the original paper. These significant discrepancies make it difficult to draw direct comparisons with the original results, and the proposed explanations for these differences (library improvements, optimization algorithms) remain speculative without rigorous empirical validation.

Rather than enhancing confidence in network deconvolution as a universal improvement over batch normalization, this reproduction suggests a more nuanced view: network deconvolution appears to be a viable alternative that performs better in many but not all scenarios. The authors' detailed reporting of computational costs and performance characteristics across architectures provides essential practical context for researchers considering which normalization technique to employ.

@jsta
Copy link
Member

jsta commented Mar 19, 2025

Ok, I have completed my review. Please see the comments below. In addition to reading the paper and git repository, I verified that I could run 2 arbitrary architectures (vgg16 and pnasnetA) for 100 epochs for both the BN and ND cases, and got similar relative accuracy results.

major comments

  • Section 1, last paragraph, says "our study attempts to reproduce the results reported in the original paper with the most recent versions of software libraries.". This is not accurate, for example, Tensorflow 2.12 is years old. Maybe change to "the most recent version of software libraries as of [some date]"?

minor comments

  • Section 4.1, first sentence, says "compared it against the state‐of‐the‐art against". This seems like a typo or grammar mistake.
  • Is there a typo in imagenet_single_experiment_densenet121.sh, should -a densenet121 -> -a densenet121d?

highly optional and/or minor curiosity questions

  • Section 5.1, first paragraph, says "[...] we averaged the results of three attempts and compared them with the original study’s reported values.". Was there variation among the attempts such that running each model/architecture combination that many times was worthwhile?
  • Figure 3, why were the reproduced values so much more accurate than the original for CIFAR100 but not CIFAR10 or Imagenet?
  • There is some information in the repository README that duplicates the paper. I wonder if you could remove information about the steps you did to create the repo and focus the README on how users will interact with it. Get them up and running quickly without wading through extraneous details. Conversely, the papers says that torchvision datasets are downloaded on-the-fly but this is not in the README.
  • It would be nice to add a time argument to the sbatch files so users have a rough idea of how long they're expected to run.
  • It's strange that Table 8 lists out the model names while Table 4 keys them out by number code.

@rougier
Copy link
Member

rougier commented Mar 24, 2025

@alchemi5t @jsta Many thanks for your detailed reviews.
@rochanaro Can you address the commend and specifically address @alchemi5t concerns about performances? It seems the change in numeric stability might not be a plausible reason for enhanced performances. Any idea on that ? Would it be possible to rerun your code (without too much hassle) with the exact same stack as the original paper?

@rochanaro
Copy link
Author

rochanaro commented Mar 24, 2025

@alchemi5t @jsta Many thanks for your detailed reviews. @rochanaro Can you address the commend and specifically address @alchemi5t concerns about performances? It seems the change in numeric stability might not be a plausible reason for enhanced performances. Any idea on that ? Would it be possible to rerun your code (without too much hassle) with the exact same stack as the original paper?

Thank you @alchemi5t and @jsta for the reviews. @rougier It would be challenging to redo the experiment using the exact same dependencies as the original paper. During our study, we attempted to contact the original authors to obtain the precise library versions, as the original study repository does not mention them, but we did not receive a response. Therefore, we adopted the following approach: we used the latest available versions of each library as of 2020 and only opted for more recent versions when compatibility issues arose.

We will soon get back with answers, revisions, and updates addressing the above concerns in our upcoming response.

@rochanaro
Copy link
Author

Full response to ReScience C reviews (submission #89)

Dear editor (@rougier) and the reviewers (@alchemi5t, @jsta) ,

We appreciate the reviewers’ careful evaluation of our reproduction study. Below is our detailed response addressing each of the reviewers’ points:

Performance Discrepancies and Exceptional Cases

While there are many cases where network deconvolution improves model performance compared to batch normalization, the reproduction results show few exceptions that contradict the original paper's universal claim. For example, in the ResNet-18 architecture with CIFAR-100 at 100 epochs, batch normalization (97.42%) actually outperformed network deconvolution (94.31%), which contradicts both the original paper's results and its central claim.

We appreciate the observation that batch normalization (BN) outperformed network deconvolution in certain cases (e.g., ResNet-18 on CIFAR-100 at 100 epochs), and we acknowledge these exceptions. Our study aimed to carefully reproduce the experiments rather than optimize hyperparameters or modify architectural choices. In the literature, such variations are not uncommon when replicating older experiments (see, e.g., Examining the Effect of Implementation Factors on Deep Learning Reproducibility, A Study on Reproducibility and Replicability of Table Structure Recognition Methods, Towards training reproducible deep learning models). We do not claim that network deconvolution universally outperforms BN; rather, our findings suggest that while ND often provides benefits, its effectiveness may be context dependent (explained in manuscript section 6 paragraph 4).

Systematic Accuracy Improvements

Another notable finding is that the reproduced accuracy values were often significantly higher than those reported in the original paper. For example:
• For VGG-16 with CIFAR-100 at 100 epochs, the original paper reported 75.32% accuracy with network deconvolution, while the reproduction achieved 99.30%
• Similar large improvements were observed across most architectures and datasets

We acknowledge the reviewer’s observation regarding systematically higher accuracy values (e.g., VGG-16 with CIFAR-100) and agree on the importance of understanding these differences. We believe that several factors contribute to these improvements. Notably, advancements in numerical stability, as documented in recent studies (e.g., Recent advances and applications of deep learning methods in materials science, A Comprehensive Review of Deep Learning: Architectures, Recent Advances, and Applications ), and enhancements in optimization algorithms may play a significant role.

Additionally, prior research supports the idea that updated frameworks can improve model performance. Mienye et al. (2024), and Qin et al. 2024 and Vaishnav et al. (2022) demonstrated that deep learning models consistently achieve higher accuracy when trained with updated frameworks due to improvements in optimization techniques and regularization strategies. Similarly, Coakley et al. (2023) found that updated hardware and software environments alone can introduce accuracy differences of over 6%, highlighting the impact of implementation-level changes on reproducibility.

While the observed accuracy gains (e.g., 75% → 99% for VGG-16 on CIFAR-100) are substantial, we emphasize that both network deconvolution (ND) and batch normalization (BN) baselines improved comparably, suggesting that these gains are driven by systemic factors rather than being specific to any particular method. Although we have provided plausible explanations based on updates in software libraries, we acknowledge that these factors may not fully account for the magnitude of the observed differences. Further investigation into these aspects is warranted, but we reiterate that the primary objective was to assess reproducibility rather than to deconstruct every underlying cause of performance variation.

We provided detailed explanations in Section 6 Paragraphs 2 and 3 in the revised manuscript.

Sharing trained weights for verification

As far as I understand, running training runs seems to be the only way to test the codebase. Due to lack of hardware I am unable to do so, but to anyone who can and would like to quickly verify the claims, releasing the trained weights at 100 epochs and a minimal script to infer the results would be greatly helpful.

We acknowledge the reviewer's (@alchemi5t) request and have taken steps to facilitate easy verification of our results. The trained model weights at 100 epochs have been made publicly available at https://osf.io/hp3ab/files/osfstorage , corresponding to the results presented in Tables 1 and 2. Additionally, we provide two minimal inference scripts (test_script_for_table_1.py and test_script_for_table_2.py) that allow users to reproduce our reported accuracy without requiring extensive computational resources. This ensures that our findings can be easily validated by downloading the weights and running a single command.

The steps have been explained in the newly added section, "To Validate the Results Using the Trained Weights (Direct Inference Without Training)" in the GitHub README.md file and in Section 5.4 of the revised manuscript.

For example, the following command can be used to verify results for the VGG-16 architecture on CIFAR-10 with batch normalization:

python test_script_table_1.py --arch vgg16 --dataset cifar10 --deconv False --model_path "checkpoints/cifar10_vgg16_BN.pth.tar"

Clarification on Software Versions (Reviewer 2’s Comment @jsta)

Section 1, last paragraph, says "our study attempts to reproduce the results reported in the original paper with the most recent versions of software libraries.". This is not accurate, for example, Tensorflow 2.12 is years old. Maybe change to "the most recent version of software libraries as of [some date]"?

In response to Reviewer 2’s remark regarding our statement in Section 1, we have revised the language. The original phrase “with the most recent versions of software libraries” has been updated to “with the most recent versions of software libraries as of 2020” to prevent any misunderstanding (Section 1 Page 3). During our study, we attempted to contact the original authors to obtain the precise library versions used in their experiments, as the repository did not include this information. Unfortunately, we did not receive a response. Therefore, we adopted the following approach: we used the latest available versions of each library as of 2020 and opted for more recent versions only when compatibility issues arose. The details are explained in Section 4.5 of the revised manuscript.

Summary and a list of changes made based on the reviewers’ and editor’s comments

*All the updates in the manuscript are highlighted in color blue.

Reviewer 1 (@alchemi5t):

  • Made trained weights public and introduced two scripts for direct model inference without training (revised manuscript Section 5.4)
  • Explained our main objective and provided explanation on accuracy discrepancies (revised manuscript Section 6)

Reviewer 2 (@jsta):

  • major comments:
    • revised the sentence in the manuscript Section 1 on Page 3.
  • minor comments:
    • grammar mistake corrected (Section 4.1)
    • -a densenet121 in GitHub file imagenet_single_experiment_densenet121.sh isn’t a typo
  • highly optional and/or minor curiosity questions:
    • The results were similar for all attempts, so the variation was minimal.
    • Since these are three different datasets, dataset-inherent factors may play a role. However, we suspect that the higher accuracy in the reproduced results for CIFAR-100 compared to CIFAR-10 and ImageNet is primarily due to improvements in regularization techniques and optimization methods in newer versions of PyTorch. (revised manuscript Section 6)
    • Mentioned about downloading CIFAR-10 and CIFAR-100 datasets via torchvision datasets in the README.md section “To reproduce results of our reproducibility study” list Item 2
    • Added --time parameter for sbatch commands (updated in README.md section “Steps we have followed to reproduce the original study” list Items 5 and 6)
    • The intention was to include model names in the columns, but due to space constraints in Tables 4 and 5, we had to use number codes instead

Editor (@rougier):

  • Explained why it is not possible to use the exact same software stack of the original study with our reproducibility study (explained in the revised manuscript Section 4.5 and in this response)
  • Provided an explanation to Reviewer 1’s accuracy discrepancies (revised manuscript Section 6 newly added content in blue and in this GitHub response)

Revised Manuscript URL: https://github.com/lamps-lab/rep-network-deconvolution/blob/master/article.pdf
Metadata URL: https://github.com/lamps-lab/rep-network-deconvolution/blob/master/metadata.yaml
Code URL: https://github.com/lamps-lab/rep-network-deconvolution

@rougier
Copy link
Member

rougier commented Apr 11, 2025

@rochanaro Thanks for your detailed answer. @jsta I guess your thumb means you're satisfied with the answer and the revised manuscript. @alchemi5t are you ok with the answer and revised manuscript?

@alchemi5t
Copy link

@rougier yes, I am satisfied with the response.

1 similar comment
@jsta
Copy link
Member

jsta commented Apr 11, 2025

@rougier yes, I am satisfied with the response.

@rochanaro
Copy link
Author

Thank you @alchemi5t and @jsta, we appreciate the replies.

@rougier
Copy link
Member

rougier commented Apr 17, 2025

Great, then we can accept your replication! Congratulations.

For the actual publicaton, I would need a link, to a github repo with the sources of your articles. And you'll need to save you code repo on software heritage such as to get a SWID.

@rochanaro
Copy link
Author

rochanaro commented Apr 17, 2025

Thank you, @rougier. The requested items are listed below. Please let us know if anything additional is needed.

GitHub link for the article source files: https://github.com/lamps-lab/rep-network-deconvolution/tree/master/documents/manuscript
SWID: swh:1:dir:680efae9c7c628f93ee03731e73c34cf00b0e245

swh:1:dir:680efae9c7c628f93ee03731e73c34cf00b0e245;
origin=https://github.com/lamps-lab/rep-network-deconvolution;
visit=swh:1:snp:6946519d5ef5dd5312c1da6543218e512c2b0633;
anchor=swh:1:rev:b06aac9c1ea4bf38d2dd86eaf63b7dd24f10f561

@rougier
Copy link
Member

rougier commented Apr 24, 2025

Sorry I'm late in handling your article. Can you fill in the metadata.yaml with as much information as you can ? By the way, you may need to take the new template that is available here: https://github.com/ReScience/template (with a swh entry).

For editor/reviewers, you'll find orcid numbers on https://rescience.github.io/board/

@rochanaro
Copy link
Author

rochanaro commented Apr 25, 2025

Dear Editor (@rougier),

We have updated the metadata.yaml file and the template with SWH entry. Please let us know if there is anything additional we can do.

Manuscript source files: https://github.com/lamps-lab/rep-network-deconvolution/tree/master/documents/manuscript
metadata.yaml : https://github.com/lamps-lab/rep-network-deconvolution/blob/master/documents/manuscript/metadata.yaml
Article PDF: https://github.com/lamps-lab/rep-network-deconvolution/blob/master/article.pdf

Thank you!

@rougier
Copy link
Member

rougier commented Apr 28, 2025

Thanks! Sandbox version is online at https://sandbox.zenodo.org/record/203035. Can you check everything is ok (including links pointing to the right place) and tell me such that I can publish the final version?

@rochanaro
Copy link
Author

Dear Editor (@rougier)

We checked the sandbox version, and everything seems to be in order.

Thank you.

@rougier
Copy link
Member

rougier commented May 2, 2025

Great, thanks. Your paper is online at https://zenodo.org/record/15172014/ and will soon appear on ReScience website.
Congratulations!

@rochanaro
Copy link
Author

Dear Editor (@rougier),

Kindly note that there has been a mistake in the previous comment. The link https://zenodo.org/record/15172014/ refers to a different article and not our reproducibility study.

@rougier
Copy link
Member

rougier commented May 2, 2025

Sorry, I meant https://zenodo.org/records/15321683

@rochanaro
Copy link
Author

Dear @rougier, @alchemi5t, @jsta

Thank you very much for your time, guidance, and support throughout the review process. We truly appreciate your efforts in handling our manuscript.

@rougier
Copy link
Member

rougier commented May 2, 2025

It's online: https://rescience.github.io/read/#volume-10-2025

@rougier rougier closed this as completed May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants