Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPS Discrepancy in DocLayout-YOLO Inference on A100 GPU #74

Open
eslamahmed235 opened this issue Dec 23, 2024 · 1 comment
Open

FPS Discrepancy in DocLayout-YOLO Inference on A100 GPU #74

eslamahmed235 opened this issue Dec 23, 2024 · 1 comment

Comments

@eslamahmed235
Copy link

eslamahmed235 commented Dec 23, 2024

Hello,

I have been working on reproducing the results in your paper "DocLayout-YOLO: Enhancing Document Layout Analysis Through Diverse Synthetic Data and Global-to-Local Adaptive Perception" and testing the published models for their speed performance (FPS). However, I’ve observed discrepancies between the FPS reported in the paper and the results from my experiments.

Below, I’ve detailed my setup, experiment results, and observations.


Setup Details

  • Hardware: Single NVIDIA A100 GPU (NC24ad_A100_v4)
  • Dataset: DocLayNet (test set)
  • Models Used: Models published on Hugging Face
  • Script Used: demo.py from the repository
  • Framework/Environment: Same as specified in your repository

  • Experiment Results

    Model Preprocess Time (ms) Inference Time (ms) Postprocess Time (ms) Overall Time (ms/image) FPS
    docstructbench_imgsz1024.pt 4.4 23.7 0.8 28.9 34.6
    doclaynet_imgsz1120_from_scratch.pt 4.2 22.1 0.8 27.1 36.9
    doclaynet_imgsz1120_docsynth_pretrain.pt 4.4 22.5 0.8 27.7 36.1
    d4la_imgsz1600_from_scratch.pt 4.3 22.4 0.8 27.5 36.4

    For comparison, the FPS reported in the paper for DocLayout-YOLO (on DocStructBench) is 85.5 FPS. My results are significantly lower, even when using a single A100 GPU as mentioned in the paper.


    Questions

    1. Are there any additional optimizations (e.g., ONNX Runtime, TensorRT, or mixed precision) that were applied during your FPS testing?
    2. Did the FPS measurement in the paper exclude preprocessing and postprocessing times?
    3. Was the reported FPS tested on a different dataset (e.g., DocStructBench) that might have simpler document layouts compared to DocLayNet? and if used on different data excepted to get this discrepancies on speed?
    4. Do you have any explanation for the discrepancies between my results and the FPS reported in the paper?

    I’d appreciate any guidance or clarifications regarding this discrepancy. Let me know if further details are required!

    Thank you for your excellent work on this project.

@JulioZhao97
Copy link
Collaborator

JulioZhao97 commented Dec 23, 2024

@eslamahmed235 Hello, thanks for your interest on our work!

  1. Explanation of FPS reported in the paper
    The FPS reported in the paper indicates the pure inference latency in batch mode during evaluation in a clean node, which can be easily accessed during various frameworks, such as YOLO (Ultralytics), DiT and LayoutLMv3 (detectron2), and DINO (mmdetection). Since YOLO-style latency metric lacks a common standardized metric and can be largely influenced by device, and operations considered, and even package version. So to conduct a fair comparison, we take this simple but more realistic measuring approach.

The FPS in the paper can be easily evaluated through evaluation process, for example in DocLayNet you can evaluate using

python val.py --data doclaynet --model doclayout_yolo_doclaynet_imgsz1120_docsynth_pretrain.pt --device 0 --batch-size 64

image

which will give you a result of FPS of 1000/12.2=81, which is slightly slower since this model use a larger resolution of 1120.

  1. Are there any additional optimizations (e.g., ONNX Runtime, TensorRT, or mixed precision) that were applied during your FPS testing?
    No, this kind of methods though large adopted in YOLO-series but are hard to reproduce and compare.

  2. Did the FPS measurement in the paper exclude preprocessing and postprocessing times?
    Yes

  3. Was the reported FPS tested on a different dataset (e.g., DocStructBench) that might have simpler document layouts compared to DocLayNet? and if used on different data excepted to get this discrepancies on speed?
    As far as I concern, this has little influence on the speed.

Kindly,
Zhiyuan Zhao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants