Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError: maximum recursion depth exceeded in comparison #16

Open
hp0716 opened this issue Sep 2, 2024 · 12 comments
Open

RecursionError: maximum recursion depth exceeded in comparison #16

hp0716 opened this issue Sep 2, 2024 · 12 comments

Comments

@hp0716
Copy link

hp0716 commented Sep 2, 2024

Hello, what should I pay attention to if I use create_lmdb_dataset.py to construct the dataset? When I run eval_rec_all_ch.py after generating the mdb file, I get an error
image

@hp0716
Copy link
Author

hp0716 commented Sep 2, 2024

When I create lmdb data, do I need to modify the config file as follows:
Eval:
dataset:
name: LMDBDataSet
ds_width: True
...
sampler:
name: MultiScaleSampler

@Topdu
Copy link
Owner

Topdu commented Sep 2, 2024

This is caused by unsuccessful data preprocessing, please check where in ratio_dataset.py: outs = transform(data, self.ops[:-1]) the preprocessing is wrong.

@hp0716
Copy link
Author

hp0716 commented Sep 3, 2024

I think my main problem is that I don't know how to construct my own dataset, should I use "create_lmdb_dataset.py" ? My data is a Chinese address, what specific adjustments need to be made.
image

@hp0716
Copy link
Author

hp0716 commented Sep 3, 2024

Because I had a problem creating the dataset, "transform" always returns none. How to solve this?
image

@Topdu
Copy link
Owner

Topdu commented Sep 3, 2024

if __name__ == '__main__':
    data_dir = './Union14M-L/'
    label_file_list = [
        './Union14M-L/train_annos/filter_jsonl_mmocr0.x/filter_train_challenging.jsonl.txt',
        './Union14M-L/train_annos/filter_jsonl_mmocr0.x/filter_train_easy.jsonl.txt',
        './Union14M-L/train_annos/filter_jsonl_mmocr0.x/filter_train_hard.jsonl.txt',
        './Union14M-L/train_annos/filter_jsonl_mmocr0.x/filter_train_medium.jsonl.txt',
        './Union14M-L/train_annos/filter_jsonl_mmocr0.x/filter_train_normal.jsonl.txt'
    ]
    save_path_root = './Union14M-L-LMDB-Filtered/'

    for data_list in label_file_list:
        save_path = save_path_root + data_list.split('/')[-1].split(
            '.')[0] + '/'
        os.makedirs(save_path, exist_ok=True)
        print(save_path)
        train_data_list = get_datalist(data_dir, data_list, 800)

        createDataset(train_data_list, save_path)

你需要修改tools/create_lmdb_dataset.py中的data_dir、label_file_list和save_path_root。其中label_file.txt中的内容应该是 img_dir_name+\t+label的形式,例如:
full_images/COCOTextV2_append/imgs/img_107271_0.jpg LAYI SILK
full_images/COCOTextV2_append/imgs/img_107339_2.jpg PumUoo
full_images/COCOTextV2_append/imgs/img_107339_7.jpg GA
其中data_dir+img_dir_name是图片可读取的路径。

@Topdu
Copy link
Owner

Topdu commented Sep 3, 2024

Because I had a problem creating the dataset, "transform" always returns none. How to solve this? ![image]

这是因为数据预处理有问题,你可以,在data返回为None时打印当前op:

def transform(data, ops=None):
    """transform."""
    if ops is None:
        ops = []
    for op in ops:
        data = op(data)
        if data is None:
            print(op)
            return None
    return data

然后找到op的具体代码,尝试定位导致返回None的具体原因,一般是label处理不通过(比如长度超过max_text_length等)或者图片读取不成功导致。

@hp0716
Copy link
Author

hp0716 commented Sep 4, 2024

感谢解答,确实是超过最大长度导致的问题,但是输出精度全为0,数据集或者参数还要做哪些调整吗?
image

@hp0716
Copy link
Author

hp0716 commented Sep 4, 2024

上述测试代码是正常的,只是识别精度很低,我想使用自己构造地址数据集一边生成数据一边训练,请问训练前是否一定要将图片和标签转换为lmdb格式?

@Topdu
Copy link
Owner

Topdu commented Sep 4, 2024

RatioDateset only supports loading data in LMDB format, you can also modify it to load data in customized formats.

@Topdu
Copy link
Owner

Topdu commented Sep 7, 2024

感谢解答,确实是超过最大长度导致的问题,但是输出精度全为0,数据集或者参数还要做哪些调整吗? ![image]

The analysis of the dataset helps in determining the best parameters such as text length and aspect ratio distribution in the dataset.

@leduy-it
Copy link

leduy-it commented Dec 9, 2024

@Topdu @hp0716
I am experiencing the same issue. I have already set max_text_len = 200, and none of the labels in my dataset exceed 200 characters. However, I am still encountering this error.

[2024/12/09 09:21:17] openrec INFO: data_idx_order_list shape: (12631, 2)
[2024/12/09 09:21:17] openrec INFO: Loaded WH ratios from cache.
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 0: 0
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 1: 691
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 2: 4302
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 3: 2550
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 4: 1079
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 5: 489
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 6: 2139
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 7: 193
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 8: 185
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 9: 176
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 10: 178
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 11: 162
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 12: 138
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 13: 99
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 14: 86
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 15: 50
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 16: 52
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 17: 16
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 18: 16
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 19: 13
[2024/12/09 09:21:17] openrec INFO: Number of samples with ratio 20: 17
0 691 48.0 21 32
0 4302 96.0 134 32
0 2550 144.0 79 32
0 1079 192.0 33 32
0 489 240.0 19 25
0 2139 288.0 101 21
0 193 336.0 10 18
0 185 384.0 11 16
0 176 432.0 12 14
0 178 480.0 14 12
0 162 528.0 14 11
0 138 576.0 13 10
0 99 624.0 11 9
0 86 672.0 9 9
0 50 720.0 6 8
0 52 768.0 6 8
0 16 816.0 2 7
0 16 864.0 2 7
0 13 912.0 2 6
0 17 960.0 2 6
[2024/12/09 09:21:17] openrec INFO: valid dataloader has 509 iters
[2024/12/09 09:21:18] openrec INFO: {'Total': 28810809, 'Trainable': 28810809}
[2024/12/09 09:21:18] openrec INFO: remove decoder.gtc_decoder.char_embed.embedding.weight
[2024/12/09 09:21:18] openrec INFO: remove decoder.gtc_decoder.ques1_head.weight
[2024/12/09 09:21:18] openrec INFO: remove decoder.gtc_decoder.ques1_head.bias
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.char_token
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.fc.weight
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.fc.bias
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.fc_kv.weight
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.fc_kv.bias
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.w_atten_block.norm1.weight
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.w_atten_block.norm1.bias
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.w_atten_block.attn.qkv.weight
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.w_atten_block.attn.proj.weight
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.w_atten_block.attn.proj.bias
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.w_atten_block.norm2.weight
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.w_atten_block.norm2.bias
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.w_atten_block.mlp.fc1.weight
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.w_atten_block.mlp.fc1.bias
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.w_atten_block.mlp.fc2.weight
[2024/12/09 09:21:18] openrec INFO: remove decoder.ctc_decoder.w_atten_block.mlp.fc2.bias
[2024/12/09 09:21:18] openrec INFO: load pretrained model from ./pretrained/best.pth
[2024/12/09 09:21:18] openrec INFO: decoder.gtc_decoder.char_embed.embedding.weight is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.gtc_decoder.ques1_head.weight is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.gtc_decoder.ques1_head.bias is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.char_token is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.fc.weight is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.fc.bias is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.fc_kv.weight is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.fc_kv.bias is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.w_atten_block.norm1.weight is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.w_atten_block.norm1.bias is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.w_atten_block.attn.qkv.weight is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.w_atten_block.attn.proj.weight is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.w_atten_block.attn.proj.bias is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.w_atten_block.norm2.weight is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.w_atten_block.norm2.bias is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.w_atten_block.mlp.fc1.weight is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.w_atten_block.mlp.fc1.bias is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.w_atten_block.mlp.fc2.weight is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: decoder.ctc_decoder.w_atten_block.mlp.fc2.bias is not in pretrained model
[2024/12/09 09:21:18] openrec INFO: finetune from checkpoint ./pretrained/best.pth
[2024/12/09 09:21:18] openrec INFO: run with torch 2.3.1 and device cuda:0
[2024/12/09 09:21:18] openrec INFO: During the training process, after the 0th epoch, an evaluation is run every 1 epoch
[2024/12/09 09:21:18] openrec INFO: During the training process, after the 0th iteration, an evaluation is run every 10 iterations
0 23921 48.0 747 32
0 20632 96.0 644 32
0 10328 144.0 322 32
0 3671 192.0 114 32
0 1488 240.0 59 25
0 3733 288.0 177 21
0 7139 336.0 396 18
0 3329 384.0 208 16
0 1270 432.0 90 14
0 814 480.0 67 12
0 616 528.0 56 11
0 448 576.0 44 10
0 308 624.0 34 9
0 188 672.0 20 9
0 82 720.0 10 8
0 22 768.0 2 8
0 7 816.0 7
0 3 864.0 7
0 1 912.0 6
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py:456: UserWarning: Applied workaround for CuDNN issue, install nvrtc.so (Triggered internally at /opt/conda/conda-bld/pytorch_1716905971132/work/aten/src/ATen/native/cudnn/Conv_v8.cpp:84.)
  return F.conv2d(input, weight, bias, self.stride,
/opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:143: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
eval model::  98%|██████████████████████████████████████████████████████████████████████████████████▌ | 500/509 [05:51<00:06,  1.48it/s]Traceback (most recent call last):
  File "/workspace/TorchOCR_duyle/tools/train_rec.py", line 37, in <module>
    main()
  File "/workspace/TorchOCR_duyle/tools/train_rec.py", line 33, in main
    trainer.train()
  File "/workspace/TorchOCR_duyle/tools/engine/trainer.py", line 384, in train
    cur_metric = self.eval()
  File "/workspace/TorchOCR_duyle/tools/engine/trainer.py", line 542, in eval
    for idx, batch in enumerate(self.valid_dataloader):
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
    return self._process_data(data)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/opt/conda/lib/python3.10/site-packages/torch/_utils.py", line 705, in reraise
    raise exception
RecursionError: Caught RecursionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/workspace/TorchOCR_duyle/tools/data/ratio_dataset_tvresize.py", line 269, in __getitem__
    return self.__getitem__([img_width, img_height, ids[0], ratio])
  File "/workspace/TorchOCR_duyle/tools/data/ratio_dataset_tvresize.py", line 269, in __getitem__
    return self.__getitem__([img_width, img_height, ids[0], ratio])
  File "/workspace/TorchOCR_duyle/tools/data/ratio_dataset_tvresize.py", line 269, in __getitem__
    return self.__getitem__([img_width, img_height, ids[0], ratio])
  [Previous line repeated 961 more times]
  File "/workspace/TorchOCR_duyle/tools/data/ratio_dataset_tvresize.py", line 263, in __getitem__
    outs = transform(data, self.ops[:-1])
  File "/workspace/TorchOCR_duyle/openrec/preprocess/__init__.py", line 47, in transform
    data = op(data)
  File "/workspace/TorchOCR_duyle/openrec/preprocess/__init__.py", line 119, in __call__
    img = data['image']
  File "/opt/conda/lib/python3.10/site-packages/PIL/Image.py", line 3318, in open
    im = _open_core(fp, filename, prefix, formats)
  File "/opt/conda/lib/python3.10/site-packages/PIL/Image.py", line 3304, in _open_core
    im = factory(fp, filename)
  File "/opt/conda/lib/python3.10/site-packages/PIL/JpegImagePlugin.py", line 840, in jpeg_factory
    im = JpegImageFile(fp, filename)
  File "/opt/conda/lib/python3.10/site-packages/PIL/ImageFile.py", line 123, in __init__
    if is_path(fp):
  File "/opt/conda/lib/python3.10/site-packages/PIL/_util.py", line 10, in is_path
    return isinstance(f, (bytes, str, os.PathLike))
  File "/opt/conda/lib/python3.10/abc.py", line 119, in __instancecheck__
    return _abc_instancecheck(cls, instance)
RecursionError: maximum recursion depth exceeded in comparison

eval model::  98%|██████████████████████████████████████████████████████████████████████████████████▌ | 500/509 [05:51<00:06,  1.42it/s]

@Topdu
Copy link
Owner

Topdu commented Dec 9, 2024

I suspect it's because the last iter has only one image (0 1 912.0 6, indicating gpu 0, with 1 image of width 912 and a preset bs of 6). When get_item reports an error, it randomly selects equal-sized images, but the image width of 912 is only 1 image, so it exceeds the recursion count and causes an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants