You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been working on reproducing the results in your paper "DocLayout-YOLO: Enhancing Document Layout Analysis Through Diverse Synthetic Data and Global-to-Local Adaptive Perception" and testing the published models for their speed performance (FPS). However, I’ve observed discrepancies between the FPS reported in the paper and the results from my experiments.
Below, I’ve detailed my setup, experiment results, and observations.
Framework/Environment: Same as specified in your repository
Experiment Results
Model
Preprocess Time (ms)
Inference Time (ms)
Postprocess Time (ms)
Overall Time (ms/image)
FPS
docstructbench_imgsz1024.pt
4.4
23.7
0.8
28.9
34.6
doclaynet_imgsz1120_from_scratch.pt
4.2
22.1
0.8
27.1
36.9
doclaynet_imgsz1120_docsynth_pretrain.pt
4.4
22.5
0.8
27.7
36.1
d4la_imgsz1600_from_scratch.pt
4.3
22.4
0.8
27.5
36.4
For comparison, the FPS reported in the paper for DocLayout-YOLO (on DocStructBench) is 85.5 FPS. My results are significantly lower, even when using a single A100 GPU as mentioned in the paper.
Questions
Are there any additional optimizations (e.g., ONNX Runtime, TensorRT, or mixed precision) that were applied during your FPS testing?
Did the FPS measurement in the paper exclude preprocessing and postprocessing times?
Was the reported FPS tested on a different dataset (e.g., DocStructBench) that might have simpler document layouts compared to DocLayNet? and if used on different data excepted to get this discrepancies on speed?
Do you have any explanation for the discrepancies between my results and the FPS reported in the paper?
I’d appreciate any guidance or clarifications regarding this discrepancy. Let me know if further details are required!
Thank you for your excellent work on this project.
The text was updated successfully, but these errors were encountered:
@eslamahmed235 Hello, thanks for your interest on our work!
Explanation of FPS reported in the paper
The FPS reported in the paper indicates the pure inference latency in batch mode during evaluation in a clean node, which can be easily accessed during various frameworks, such as YOLO (Ultralytics), DiT and LayoutLMv3 (detectron2), and DINO (mmdetection). Since YOLO-style latency metric lacks a common standardized metric and can be largely influenced by device, and operations considered, and even package version. So to conduct a fair comparison, we take this simple but more realistic measuring approach.
The FPS in the paper can be easily evaluated through evaluation process, for example in DocLayNet you can evaluate using
which will give you a result of FPS of 1000/12.2=81, which is slightly slower since this model use a larger resolution of 1120.
Are there any additional optimizations (e.g., ONNX Runtime, TensorRT, or mixed precision) that were applied during your FPS testing?
No, this kind of methods though large adopted in YOLO-series but are hard to reproduce and compare.
Did the FPS measurement in the paper exclude preprocessing and postprocessing times?
Yes
Was the reported FPS tested on a different dataset (e.g., DocStructBench) that might have simpler document layouts compared to DocLayNet? and if used on different data excepted to get this discrepancies on speed?
As far as I concern, this has little influence on the speed.
Hello,
I have been working on reproducing the results in your paper "DocLayout-YOLO: Enhancing Document Layout Analysis Through Diverse Synthetic Data and Global-to-Local Adaptive Perception" and testing the published models for their speed performance (FPS). However, I’ve observed discrepancies between the FPS reported in the paper and the results from my experiments.
Below, I’ve detailed my setup, experiment results, and observations.
Setup Details
demo.py
from the repositoryExperiment Results
For comparison, the FPS reported in the paper for DocLayout-YOLO (on DocStructBench) is 85.5 FPS. My results are significantly lower, even when using a single A100 GPU as mentioned in the paper.
Questions
I’d appreciate any guidance or clarifications regarding this discrepancy. Let me know if further details are required!
Thank you for your excellent work on this project.
The text was updated successfully, but these errors were encountered: