We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请对问题进行描述:在上传一些pdf时会报错UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 9: invalid continuation byte导致无法正常工作,我是直接使用docker拉取的,我看到 converter.py 应该是修复了这个问题的,请问下是docker的源代码没有更新吗?
运行日志如下: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 9: invalid continuation byte Files before translation: ['20250117233209_kzc84ja3-dual.pdf', 'BL24C256A-test.pdf', 'G_series_Lua_API-mono.pdf', 'BL24C256A-test1.pdf', 'mathematics-10-03413-v2-dual.pdf', 'BL24C256A-008.pdf', 'mathematics-10-03413-v2-mono.pdf', 'BL24C256A-3.pdf', 'Graph Equations Involving Tensor Product of Graphs.pdf', 'BL24C256A-2-mono.pdf', 'BL24C256A-2.pdf', 'Graph Equations Involving Tensor Product of Graphs-mono.pdf', 'BL24C256A.pdf', 'BL24C256A-1-mono.pdf', 'BL24C256A-1.pdf', 'BL24C256A-1-dual.pdf', 'C700177_模数转换芯片ADC_SGM58031XMS10G-TR_规格书_WJ1490145.PDF', 'BL24C256A-2-dual.pdf', '20250117233209_kzc84ja3.pdf', 'Graph Equations Involving Tensor Product of Graphs-dual.pdf', 'BL24C256A-09.pdf', 'C2859066_心率传感器_MAX30101EFDT_规格书_WJ09450.PDF', 'G_series_Lua_API.pdf', 'mathematics-10-03413-v2.pdf', 'G_series_Lua_API-dual.pdf', '20250117233209_kzc84ja3-mono.pdf'] {'files': ['pdf2zh_files/BL24C256A-2.pdf'], 'pages': None, 'lang_in': 'en', 'lang_out': 'zh', 'service': 'google', 'output': PosixPath('pdf2zh_files'), 'thread': 4, 'callback': <function translate_file..progress_bar at 0x7f232f5d2480>}
0%| | 0/16 [00:00<?, ?it/s] 6%|▋ | 1/16 [00:01<00:15, 1.06s/it] Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/gradio/queueing.py", line 625, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/gradio/route_utils.py", line 322, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/gradio/blocks.py", line 2047, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/gradio/blocks.py", line 1594, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2505, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 1005, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/gradio/utils.py", line 869, in wrapper response = f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pdf2zh/gui.py", line 165, in translate_file translate(**param) File "/usr/local/lib/python3.12/site-packages/pdf2zh/high_level.py", line 278, in translate s_mono, s_dual = translate_stream(s_raw, **locals()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pdf2zh/high_level.py", line 213, in translate_stream obj_patch: dict = translate_patch(fp, **locals()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pdf2zh/high_level.py", line 148, in translate_patch interpreter.process_page(page) File "/usr/local/lib/python3.12/site-packages/pdf2zh/pdfinterp.py", line 266, in process_page ops_new = self.device.end_page(page) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pdf2zh/converter.py", line 56, in end_page return self.receive_layout(self.cur_item) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pdf2zh/converter.py", line 224, in receive_layout or vflag(child.fontname, child.get_text()) # 3. 公式字体 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/pdf2zh/converter.py", line 175, in vflag font = font.decode() ^^^^^^^^^^^^^ UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 9: invalid continuation byte
Important
请提供用于复现测试的 PDF 文档
BL24C256A.pdf
The text was updated successfully, but these errors were encountered:
根据测试,最新版源码是可以翻译的。docker的确实也没有更新。 现有后端BL24C256A-dual.pdf
新后端BL24C256A.zh-CN.dual.pdf
另请注意,目前本项目并没有针对技术文档做优化,所以效果不一定好。针对技术文档的优化在相对远期的待办事项中,当前新后端的主要任务是优化论文效果以及修复bug,请耐心等待。
Sorry, something went wrong.
好的,期待后面的更新
已更新
No branches or pull requests
问题描述
请对问题进行描述:在上传一些pdf时会报错UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 9: invalid continuation byte导致无法正常工作,我是直接使用docker拉取的,我看到 converter.py 应该是修复了这个问题的,请问下是docker的源代码没有更新吗?
运行日志如下:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 9: invalid continuation byte
Files before translation: ['20250117233209_kzc84ja3-dual.pdf', 'BL24C256A-test.pdf', 'G_series_Lua_API-mono.pdf', 'BL24C256A-test1.pdf', 'mathematics-10-03413-v2-dual.pdf', 'BL24C256A-008.pdf', 'mathematics-10-03413-v2-mono.pdf', 'BL24C256A-3.pdf', 'Graph Equations Involving Tensor Product of Graphs.pdf', 'BL24C256A-2-mono.pdf', 'BL24C256A-2.pdf', 'Graph Equations Involving Tensor Product of Graphs-mono.pdf', 'BL24C256A.pdf', 'BL24C256A-1-mono.pdf', 'BL24C256A-1.pdf', 'BL24C256A-1-dual.pdf', 'C700177_模数转换芯片ADC_SGM58031XMS10G-TR_规格书_WJ1490145.PDF', 'BL24C256A-2-dual.pdf', '20250117233209_kzc84ja3.pdf', 'Graph Equations Involving Tensor Product of Graphs-dual.pdf', 'BL24C256A-09.pdf', 'C2859066_心率传感器_MAX30101EFDT_规格书_WJ09450.PDF', 'G_series_Lua_API.pdf', 'mathematics-10-03413-v2.pdf', 'G_series_Lua_API-dual.pdf', '20250117233209_kzc84ja3-mono.pdf']
{'files': ['pdf2zh_files/BL24C256A-2.pdf'], 'pages': None, 'lang_in': 'en', 'lang_out': 'zh', 'service': 'google', 'output': PosixPath('pdf2zh_files'), 'thread': 4, 'callback': <function translate_file..progress_bar at 0x7f232f5d2480>}
0%| | 0/16 [00:00<?, ?it/s]
6%|▋ | 1/16 [00:01<00:15, 1.06s/it]
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/gradio/queueing.py", line 625, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/gradio/blocks.py", line 2047, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/gradio/blocks.py", line 1594, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2505, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 1005, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/gradio/utils.py", line 869, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pdf2zh/gui.py", line 165, in translate_file
translate(**param)
File "/usr/local/lib/python3.12/site-packages/pdf2zh/high_level.py", line 278, in translate
s_mono, s_dual = translate_stream(s_raw, **locals())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pdf2zh/high_level.py", line 213, in translate_stream
obj_patch: dict = translate_patch(fp, **locals())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pdf2zh/high_level.py", line 148, in translate_patch
interpreter.process_page(page)
File "/usr/local/lib/python3.12/site-packages/pdf2zh/pdfinterp.py", line 266, in process_page
ops_new = self.device.end_page(page)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pdf2zh/converter.py", line 56, in end_page
return self.receive_layout(self.cur_item)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pdf2zh/converter.py", line 224, in receive_layout
or vflag(child.fontname, child.get_text()) # 3. 公式字体
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pdf2zh/converter.py", line 175, in vflag
font = font.decode()
^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 9: invalid continuation byte
测试文档
Important
请提供用于复现测试的 PDF 文档
BL24C256A.pdf
The text was updated successfully, but these errors were encountered: