-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training with PyTorch is significantly slower than with PaddleOCR #9
Comments
Please post the config file here and we will try to reproduce the issue. |
Thank you for your prompt response. Here is the configuration file that we are using for our training setup. We hope this helps in reproducing the issue.
Paddle:
|
Based on the provided config file, we obtain the following results: With AMP enabled: The possible reasons are as follows:
In addition, Paddle may experience training crashes when using AMP and needs to be careful in setting the AMP parameters.Torch AMP performs consistently and is nearly identical to the full-precision results. |
Thank you for your prompt response and for sharing your results. However, I am puzzled as to why my training speed decreases when AMP is enabled. It seems that instead of speeding up, my training slows down significantly with AMP. For your reference, I am using PyTorch version 2.1.1 with CUDA 11.8. Any insights or suggestions on what might be causing this issue would be greatly appreciated. Thank you! |
It is indeed puzzling. We use the same setting: PyTorch version 2.1.1 with CUDA 11.8. What is the gpu type? |
Thank you for your response. I am using four NVIDIA H800 GPUs for training. |
Unfortunately, we have no way to execute it on the H800, the GPUs we use are the 4090. If you can, try to verify it again on another type of GPU. |
Thank you for your understanding. |
I conducted some experiments on a machine with V100 GPUs, using the OpenOCR code directly pulled from the repository without any modifications. Due to insufficient GPU memory, I had to reduce the batch size to 64. The results show that, without AMP, the PyTorch version seems to perform better. However, with AMP enabled, the performance of PyTorch's OpenOCR is significantly lower, which is quite strange. Here are the details: Batch Size: 64, GPU: V100 Without AMP: With AMP enabled: Additionally, when using AMP, I encounter the following warning, although the code doesn't seem to reflect this order:
|
Please post the log like this: [2024/08/04 11:36:10] openrec INFO: epoch: [1/2], global_step: 310, lr: 0.000006, acc: 0.000000, norm_edit_dis: 0.000000, num_samples: 64.000000, loss: 22.806549, avg_reader_cost: 0.00679 s, avg_batch_cost: 0.19636 s, avg_samples: 64.0, ips: 325.93692 samples/s, eta: 1:25:07 |
OpenOCR(Without AMP):
OpenOCR(With AMP enabled:):
|
It's so strange. Also, we perform AMP (OpenOCR) on 3090 GPUs accelerated about 50%. You can try changing the torch version, or verifying other models. |
Additionally, to investigate the performance further, I reduced the number of dictionary items from 8000 to 3000 and 1000, respectively. I observed that smaller dictionary sizes tend to align more with the expected performance improvements when AMP is enabled. Here are the details: Batch Size: 64, GPU: V100 Without AMP: 3000 dictionary items:
1000 dictionary items:
With AMP enabled: 3000 dictionary items:
1000 dictionary items:
|
Our experiments above were executed on the English 94 categories + 3 special categories. We will verify the effect of character categories again based on the information you provide. |
Chinese 6624 dictionary Without AMP: [2024/08/06 18:05:16] openrec INFO: epoch: [1/2], global_step: 100, lr: 0.000005, loss: 68.377701, avg_reader_cost: 0.00710 s, avg_batch_cost: 0.15078 s, avg_samples: 64.0, ips: 424.46925 samples/s, eta: 0:28:14
[2024/08/06 18:05:19] openrec INFO: epoch: [1/2], global_step: 110, lr: 0.000005, loss: 63.503082, avg_reader_cost: 0.00717 s, avg_batch_cost: 0.15115 s, avg_samples: 64.0, ips: 423.41447 samples/s, eta: 0:27:50
[2024/08/06 18:05:22] openrec INFO: epoch: [1/2], global_step: 120, lr: 0.000006, loss: 63.131191, avg_reader_cost: 0.00723 s, avg_batch_cost: 0.15088 s, avg_samples: 64.0, ips: 424.17841 samples/s, eta: 0:27:30
[2024/08/06 18:05:25] openrec INFO: epoch: [1/2], global_step: 130, lr: 0.000006, loss: 61.284630, avg_reader_cost: 0.00737 s, avg_batch_cost: 0.15119 s, avg_samples: 64.0, ips: 423.31444 samples/s, eta: 0:27:13
[2024/08/06 18:05:28] openrec INFO: epoch: [1/2], global_step: 140, lr: 0.000007, loss: 59.751579, avg_reader_cost: 0.00718 s, avg_batch_cost: 0.15115 s, avg_samples: 64.0, ips: 423.43237 samples/s, eta: 0:26:58
[2024/08/06 18:05:31] openrec INFO: epoch: [1/2], global_step: 150, lr: 0.000007, loss: 60.813232, avg_reader_cost: 0.00701 s, avg_batch_cost: 0.15105 s, avg_samples: 64.0, ips: 423.69061 samples/s, eta: 0:26:45
[2024/08/06 18:05:33] openrec INFO: epoch: [1/2], global_step: 160, lr: 0.000008, loss: 62.727783, avg_reader_cost: 0.00719 s, avg_batch_cost: 0.15147 s, avg_samples: 64.0, ips: 422.53764 samples/s, eta: 0:26:33
[2024/08/06 18:05:36] openrec INFO: epoch: [1/2], global_step: 170, lr: 0.000008, loss: 60.813217, avg_reader_cost: 0.00668 s, avg_batch_cost: 0.15051 s, avg_samples: 64.0, ips: 425.23351 samples/s, eta: 0:26:22
[2024/08/06 18:05:39] openrec INFO: epoch: [1/2], global_step: 180, lr: 0.000009, loss: 58.633625, avg_reader_cost: 0.00688 s, avg_batch_cost: 0.15068 s, avg_samples: 64.0, ips: 424.74361 samples/s, eta: 0:26:13
[2024/08/06 18:05:42] openrec INFO: epoch: [1/2], global_step: 190, lr: 0.000009, loss: 58.076485, avg_reader_cost: 0.00680 s, avg_batch_cost: 0.15077 s, avg_samples: 64.0, ips: 424.47697 samples/s, eta: 0:26:04
[2024/08/06 18:05:45] openrec INFO: epoch: [1/2], global_step: 200, lr: 0.000010, loss: 60.331661, avg_reader_cost: 0.00687 s, avg_batch_cost: 0.15088 s, avg_samples: 64.0, ips: 424.17754 samples/s, eta: 0:25:56
[2024/08/06 18:05:47] openrec INFO: epoch: [1/2], global_step: 210, lr: 0.000010, loss: 58.427017, avg_reader_cost: 0.00692 s, avg_batch_cost: 0.15091 s, avg_samples: 64.0, ips: 424.10510 samples/s, eta: 0:25:49
[2024/08/06 18:05:50] openrec INFO: epoch: [1/2], global_step: 220, lr: 0.000011, loss: 59.204517, avg_reader_cost: 0.00691 s, avg_batch_cost: 0.15059 s, avg_samples: 64.0, ips: 424.98623 samples/s, eta: 0:25:42
[2024/08/06 18:05:53] openrec INFO: epoch: [1/2], global_step: 230, lr: 0.000011, loss: 58.415764, avg_reader_cost: 0.00680 s, avg_batch_cost: 0.15089 s, avg_samples: 64.0, ips: 424.15837 samples/s, eta: 0:25:35
[2024/08/06 18:05:56] openrec INFO: epoch: [1/2], global_step: 240, lr: 0.000012, loss: 56.117561, avg_reader_cost: 0.00695 s, avg_batch_cost: 0.15076 s, avg_samples: 64.0, ips: 424.50422 samples/s, eta: 0:25:29
[2024/08/06 18:05:59] openrec INFO: epoch: [1/2], global_step: 250, lr: 0.000012, loss: 57.601730, avg_reader_cost: 0.00688 s, avg_batch_cost: 0.15065 s, avg_samples: 64.0, ips: 424.81621 samples/s, eta: 0:25:23
[2024/08/06 18:06:01] openrec INFO: epoch: [1/2], global_step: 260, lr: 0.000013, loss: 57.454727, avg_reader_cost: 0.00670 s, avg_batch_cost: 0.15068 s, avg_samples: 64.0, ips: 424.75228 samples/s, eta: 0:25:18
[2024/08/06 18:06:04] openrec INFO: epoch: [1/2], global_step: 270, lr: 0.000013, loss: 54.404583, avg_reader_cost: 0.00685 s, avg_batch_cost: 0.15115 s, avg_samples: 64.0, ips: 423.43403 samples/s, eta: 0:25:13
[2024/08/06 18:06:07] openrec INFO: epoch: [1/2], global_step: 280, lr: 0.000014, loss: 56.005775, avg_reader_cost: 0.00674 s, avg_batch_cost: 0.15045 s, avg_samples: 64.0, ips: 425.38014 samples/s, eta: 0:25:08
[2024/08/06 18:06:10] openrec INFO: epoch: [1/2], global_step: 290, lr: 0.000014, loss: 57.292404, avg_reader_cost: 0.00684 s, avg_batch_cost: 0.15064 s, avg_samples: 64.0, ips: 424.86011 samples/s, eta: 0:25:04
[2024/08/06 18:06:13] openrec INFO: epoch: [1/2], global_step: 300, lr: 0.000015, loss: 54.812862, avg_reader_cost: 0.00682 s, avg_batch_cost: 0.15047 s, avg_samples: 64.0, ips: 425.32251 samples/s, eta: 0:24:59
[2024/08/06 18:06:16] openrec INFO: epoch: [1/2], global_step: 310, lr: 0.000015, loss: 52.929249, avg_reader_cost: 0.00726 s, avg_batch_cost: 0.15144 s, avg_samples: 64.0, ips: 422.62292 samples/s, eta: 0:24:55
[2024/08/06 18:06:18] openrec INFO: epoch: [1/2], global_step: 320, lr: 0.000016, loss: 52.851337, avg_reader_cost: 0.00696 s, avg_batch_cost: 0.15058 s, avg_samples: 64.0, ips: 425.01887 samples/s, eta: 0:24:51
[2024/08/06 18:06:21] openrec INFO: epoch: [1/2], global_step: 330, lr: 0.000016, loss: 52.118073, avg_reader_cost: 0.00721 s, avg_batch_cost: 0.15060 s, avg_samples: 64.0, ips: 424.97143 samples/s, eta: 0:24:47
[2024/08/06 18:06:24] openrec INFO: epoch: [1/2], global_step: 340, lr: 0.000017, loss: 52.040649, avg_reader_cost: 0.00686 s, avg_batch_cost: 0.15040 s, avg_samples: 64.0, ips: 425.52080 samples/s, eta: 0:24:43
[2024/08/06 18:06:27] openrec INFO: epoch: [1/2], global_step: 350, lr: 0.000018, loss: 49.329895, avg_reader_cost: 0.00685 s, avg_batch_cost: 0.15061 s, avg_samples: 64.0, ips: 424.95145 samples/s, eta: 0:24:40
[2024/08/06 18:06:30] openrec INFO: epoch: [1/2], global_step: 360, lr: 0.000018, loss: 51.202164, avg_reader_cost: 0.00691 s, avg_batch_cost: 0.15066 s, avg_samples: 64.0, ips: 424.80195 samples/s, eta: 0:24:36
[2024/08/06 18:06:32] openrec INFO: epoch: [1/2], global_step: 370, lr: 0.000019, loss: 51.715294, avg_reader_cost: 0.00687 s, avg_batch_cost: 0.15062 s, avg_samples: 64.0, ips: 424.90214 samples/s, eta: 0:24:33
[2024/08/06 18:06:35] openrec INFO: epoch: [1/2], global_step: 380, lr: 0.000019, loss: 51.019779, avg_reader_cost: 0.00700 s, avg_batch_cost: 0.15098 s, avg_samples: 64.0, ips: 423.89481 samples/s, eta: 0:24:30
[2024/08/06 18:06:38] openrec INFO: epoch: [1/2], global_step: 390, lr: 0.000020, loss: 51.824024, avg_reader_cost: 0.00692 s, avg_batch_cost: 0.15073 s, avg_samples: 64.0, ips: 424.61166 samples/s, eta: 0:24:26
[2024/08/06 18:06:41] openrec INFO: epoch: [1/2], global_step: 400, lr: 0.000020, loss: 50.717896, avg_reader_cost: 0.00669 s, avg_batch_cost: 0.15070 s, avg_samples: 64.0, ips: 424.69469 samples/s, eta: 0:24:23 With AMP: [2024/08/06 18:03:31] openrec INFO: epoch: [1/2], global_step: 100, lr: 0.000005, loss: 69.376022, avg_reader_cost: 0.00971 s, avg_batch_cost: 0.18023 s, avg_samples: 64.0, ips: 355.10891 samples/s, eta: 0:32:33
[2024/08/06 18:03:33] openrec INFO: epoch: [1/2], global_step: 110, lr: 0.000005, loss: 63.990349, avg_reader_cost: 0.01058 s, avg_batch_cost: 0.18164 s, avg_samples: 64.0, ips: 352.35357 samples/s, eta: 0:32:12
[2024/08/06 18:03:35] openrec INFO: epoch: [1/2], global_step: 120, lr: 0.000006, loss: 63.299240, avg_reader_cost: 0.01072 s, avg_batch_cost: 0.18242 s, avg_samples: 64.0, ips: 350.84672 samples/s, eta: 0:31:55
[2024/08/06 18:03:37] openrec INFO: epoch: [1/2], global_step: 130, lr: 0.000006, loss: 61.347343, avg_reader_cost: 0.01023 s, avg_batch_cost: 0.18157 s, avg_samples: 64.0, ips: 352.48265 samples/s, eta: 0:31:40
[2024/08/06 18:03:38] openrec INFO: epoch: [1/2], global_step: 140, lr: 0.000007, loss: 59.796600, avg_reader_cost: 0.01043 s, avg_batch_cost: 0.18167 s, avg_samples: 64.0, ips: 352.28051 samples/s, eta: 0:31:27
[2024/08/06 18:03:40] openrec INFO: epoch: [1/2], global_step: 150, lr: 0.000007, loss: 60.858330, avg_reader_cost: 0.01041 s, avg_batch_cost: 0.18174 s, avg_samples: 64.0, ips: 352.15369 samples/s, eta: 0:31:15
[2024/08/06 18:03:42] openrec INFO: epoch: [1/2], global_step: 160, lr: 0.000008, loss: 62.775482, avg_reader_cost: 0.00933 s, avg_batch_cost: 0.18135 s, avg_samples: 64.0, ips: 352.91465 samples/s, eta: 0:31:04
[2024/08/06 18:03:44] openrec INFO: epoch: [1/2], global_step: 170, lr: 0.000008, loss: 60.844872, avg_reader_cost: 0.01013 s, avg_batch_cost: 0.18200 s, avg_samples: 64.0, ips: 351.65021 samples/s, eta: 0:30:55
[2024/08/06 18:03:46] openrec INFO: epoch: [1/2], global_step: 180, lr: 0.000009, loss: 58.660248, avg_reader_cost: 0.01100 s, avg_batch_cost: 0.18293 s, avg_samples: 64.0, ips: 349.85706 samples/s, eta: 0:30:47
[2024/08/06 18:03:48] openrec INFO: epoch: [1/2], global_step: 190, lr: 0.000009, loss: 58.103447, avg_reader_cost: 0.01108 s, avg_batch_cost: 0.18313 s, avg_samples: 64.0, ips: 349.47937 samples/s, eta: 0:30:39
[2024/08/06 18:03:50] openrec INFO: epoch: [1/2], global_step: 200, lr: 0.000010, loss: 60.355034, avg_reader_cost: 0.01048 s, avg_batch_cost: 0.18146 s, avg_samples: 64.0, ips: 352.69481 samples/s, eta: 0:30:32
[2024/08/06 18:03:51] openrec INFO: epoch: [1/2], global_step: 210, lr: 0.000010, loss: 58.444603, avg_reader_cost: 0.00983 s, avg_batch_cost: 0.18169 s, avg_samples: 64.0, ips: 352.24348 samples/s, eta: 0:30:25
[2024/08/06 18:03:53] openrec INFO: epoch: [1/2], global_step: 220, lr: 0.000011, loss: 59.229370, avg_reader_cost: 0.00897 s, avg_batch_cost: 0.18147 s, avg_samples: 64.0, ips: 352.66891 samples/s, eta: 0:30:19
[2024/08/06 18:03:55] openrec INFO: epoch: [1/2], global_step: 230, lr: 0.000011, loss: 58.432549, avg_reader_cost: 0.00992 s, avg_batch_cost: 0.18137 s, avg_samples: 64.0, ips: 352.87377 samples/s, eta: 0:30:12
[2024/08/06 18:03:57] openrec INFO: epoch: [1/2], global_step: 240, lr: 0.000012, loss: 56.128662, avg_reader_cost: 0.00969 s, avg_batch_cost: 0.18164 s, avg_samples: 64.0, ips: 352.34506 samples/s, eta: 0:30:07
[2024/08/06 18:03:59] openrec INFO: epoch: [1/2], global_step: 250, lr: 0.000012, loss: 57.606712, avg_reader_cost: 0.01010 s, avg_batch_cost: 0.18331 s, avg_samples: 64.0, ips: 349.13365 samples/s, eta: 0:30:02
[2024/08/06 18:04:01] openrec INFO: epoch: [1/2], global_step: 260, lr: 0.000013, loss: 57.450527, avg_reader_cost: 0.01046 s, avg_batch_cost: 0.18173 s, avg_samples: 64.0, ips: 352.18021 samples/s, eta: 0:29:57
[2024/08/06 18:04:02] openrec INFO: epoch: [1/2], global_step: 270, lr: 0.000013, loss: 54.395607, avg_reader_cost: 0.01038 s, avg_batch_cost: 0.18134 s, avg_samples: 64.0, ips: 352.93065 samples/s, eta: 0:29:52
[2024/08/06 18:04:04] openrec INFO: epoch: [1/2], global_step: 280, lr: 0.000014, loss: 55.983822, avg_reader_cost: 0.01070 s, avg_batch_cost: 0.18134 s, avg_samples: 64.0, ips: 352.93195 samples/s, eta: 0:29:47
[2024/08/06 18:04:06] openrec INFO: epoch: [1/2], global_step: 290, lr: 0.000014, loss: 57.264977, avg_reader_cost: 0.01036 s, avg_batch_cost: 0.18174 s, avg_samples: 64.0, ips: 352.15549 samples/s, eta: 0:29:43
[2024/08/06 18:04:08] openrec INFO: epoch: [1/2], global_step: 300, lr: 0.000015, loss: 54.802746, avg_reader_cost: 0.01047 s, avg_batch_cost: 0.18134 s, avg_samples: 64.0, ips: 352.92583 samples/s, eta: 0:29:38
[2024/08/06 18:04:10] openrec INFO: epoch: [1/2], global_step: 310, lr: 0.000015, loss: 52.902641, avg_reader_cost: 0.01031 s, avg_batch_cost: 0.18130 s, avg_samples: 64.0, ips: 353.01035 samples/s, eta: 0:29:34
[2024/08/06 18:04:12] openrec INFO: epoch: [1/2], global_step: 320, lr: 0.000016, loss: 52.814461, avg_reader_cost: 0.01083 s, avg_batch_cost: 0.18140 s, avg_samples: 64.0, ips: 352.81770 samples/s, eta: 0:29:30
[2024/08/06 18:04:14] openrec INFO: epoch: [1/2], global_step: 330, lr: 0.000016, loss: 52.073181, avg_reader_cost: 0.01115 s, avg_batch_cost: 0.18350 s, avg_samples: 64.0, ips: 348.77103 samples/s, eta: 0:29:27
[2024/08/06 18:04:15] openrec INFO: epoch: [1/2], global_step: 340, lr: 0.000017, loss: 51.995941, avg_reader_cost: 0.01031 s, avg_batch_cost: 0.18142 s, avg_samples: 64.0, ips: 352.77681 samples/s, eta: 0:29:23
[2024/08/06 18:04:17] openrec INFO: epoch: [1/2], global_step: 350, lr: 0.000018, loss: 49.285133, avg_reader_cost: 0.01002 s, avg_batch_cost: 0.18138 s, avg_samples: 64.0, ips: 352.84831 samples/s, eta: 0:29:19
[2024/08/06 18:04:19] openrec INFO: epoch: [1/2], global_step: 360, lr: 0.000018, loss: 51.151272, avg_reader_cost: 0.00993 s, avg_batch_cost: 0.18172 s, avg_samples: 64.0, ips: 352.19066 samples/s, eta: 0:29:16
[2024/08/06 18:04:21] openrec INFO: epoch: [1/2], global_step: 370, lr: 0.000019, loss: 51.662258, avg_reader_cost: 0.00962 s, avg_batch_cost: 0.18115 s, avg_samples: 64.0, ips: 353.29306 samples/s, eta: 0:29:12
[2024/08/06 18:04:23] openrec INFO: epoch: [1/2], global_step: 380, lr: 0.000019, loss: 50.934994, avg_reader_cost: 0.00991 s, avg_batch_cost: 0.18152 s, avg_samples: 64.0, ips: 352.57631 samples/s, eta: 0:29:09
[2024/08/06 18:04:25] openrec INFO: epoch: [1/2], global_step: 390, lr: 0.000020, loss: 51.761528, avg_reader_cost: 0.00978 s, avg_batch_cost: 0.18120 s, avg_samples: 64.0, ips: 353.19660 samples/s, eta: 0:29:06
[2024/08/06 18:04:26] openrec INFO: epoch: [1/2], global_step: 400, lr: 0.000020, loss: 50.644489, avg_reader_cost: 0.00957 s, avg_batch_cost: 0.18125 s, avg_samples: 64.0, ips: 353.10103 samples/s, eta: 0:29:02 The IPS is consistent with what you posted. However the IPS calculated in the code seems to be working without error. We will follow up with a deeper troubleshooting of this issue. In addition, when the number of dictionaries is too large, it is recommended to set ''cal_metric_during_train'' to False. This will speed up the training significantly. |
I appreciate your insights regarding the IPS and the log timestamps showing that AMP indeed worked. I'll look forward to further troubleshooting on this issue. Regarding the suggestion to set cal_metric_during_train to False when the number of dictionaries is large, I will make this adjustment in my experiments to see how it impacts the training speed. Thank you again for your assistance! |
I somehow suspect that the post-processing of the two is inconsistent. You can save the recognition results of both and observe the difference between them. |
what is your system and paddle version to train rec_svtrv2_ch.yml? |
I have noticed that the training speed of our model using PyTorch is significantly slower compared to PaddleOCR. Here are some specific examples using the SVTR model under different settings:
Without AMP (Automatic Mixed Precision):
PaddleOCR: 310 samples/s
PyTorch: 250 samples/s
With AMP enabled:
PaddleOCR: 490 samples/s
PyTorch: 150 samples/s
Is there any known reason for such a discrepancy in training speeds? Are there any optimizations or configurations that we might be missing in our PyTorch setup to achieve similar or better performance than PaddleOCR?
The text was updated successfully, but these errors were encountered: