Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] some hidden error when using sif4sci or GensimWordTokenizer #114

Open
KenelmQLH opened this issue Apr 6, 2022 · 1 comment
Open
Assignees

Comments

@KenelmQLH
Copy link
Collaborator

🐛 Description

sif4sci may return None;Similarly,GensimWordTokenizer may return None, ethier.

Error Message

(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=100 before running your script.)

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

import json
from EduNLP.SIF import sif4sci, is_sif, to_sif

def load_items2():
  items = []
  with open("OpenLUNA.json", encoding="utf-8") as f:
      for line in f:
          items.append( json.loads(line))
  return items

items = load_items2()

# ----------------------------------------- #
tokenization_params1 = {
  "formula_params": {
    "method": "linear",
    "symbolize_figure_formula": True
  }
}

tokenizer = GensimWordTokenizer(symbol="fgm")

# ----------------------------------------- #
wrong_num = 0
for item in items:  
  res = sif4sci(item["stem"], symbol="gm", tokenization_params=tokenization_params1, errors="ignore")
  # res = tokenizer(item["stem"])

  if res is None:
    wrong_num += 1


print(f"There are {wrong_num} / {len(items)} wrong cases!")
# There are 156 / 792 wrong cases!

What have you tried to solve it?

Actually, I figure out that this is caused by our way to hangle Error raised, which is "ignore" in GensimWordTokenizer.

But, as I look at the specific error, I find one main type related to SIF Parser. So I wonder if we need to handle this problem ?

For example, Parser can not identify "n=" and "p="

(1)

s1 = "执行右面的程序框图,则输出的n=$\\FigureID{3bf20b93-8af1-11eb-b205-b46bfc50aa29}$$\\FigureID{59b88b3f-8af1-11eb-9450-b46bfc50aa29}$$\\FigureID{63116570-8b75-11eb-b694-b46bfc50aa29}$$\\FigureID{6a006177-8b76-11eb-9ac0-b46bfc50aa29}$$\\FigureID{088f15e9-8b7c-11eb-959f-b46bfc50aa29}$"
is_sif(s1)
RecursionError                            Traceback (most recent call last)
<ipython-input-3-a8de420882df> in <module>
     11 
     12 # ----------------------------------------- #
---> 13 is_sif(s1)
     14 
     15 # ----------------------------------------- #

e:\workustc\edunlp\workmaster\edunlp\EduNLP\SIF\sif.py in is_sif(item, check_formula, return_parser)
     50     """
     51     item_parser = Parser(item, check_formula)
---> 52     item_parser.description_list()
     53     if item_parser.fomula_illegal_flag:
     54         raise ValueError(item_parser.fomula_illegal_message)

e:\workustc\edunlp\workmaster\edunlp\EduNLP\SIF\parser\parser.py in description_list(self)
    344         """
    345         # print('call description_list')
--> 346         self.description()
    347         if self.error_flag:
    348             # print("Error")

e:\workustc\edunlp\workmaster\edunlp\EduNLP\SIF\parser\parser.py in description(self)
    304         #         if self.error_flag:
    305         #             return
--> 306         self.txt_list()
    307         if self.error_flag:
    308             return

e:\workustc\edunlp\workmaster\edunlp\EduNLP\SIF\parser\parser.py in txt_list(self)
    298             return
    299         if self.lookahead != self.empty:
--> 300             self.txt_list()
    301 
    302     def description(self):

... last 1 frames repeated, from the frame below ...

e:\workustc\edunlp\workmaster\edunlp\EduNLP\SIF\parser\parser.py in txt_list(self)
    298             return
    299         if self.lookahead != self.empty:
--> 300             self.txt_list()
    301 
    302     def description(self):

RecursionError: maximum recursion depth exceeded in comparison

Environment

Operating System: windows

Python Version: Pyhon 3.6

Additional context

@tswsxk
Copy link
Contributor

tswsxk commented Apr 7, 2022

Yes, I think we should handle it

@KenelmQLH KenelmQLH self-assigned this Apr 13, 2022
@KenelmQLH KenelmQLH assigned pingzhili and unassigned KenelmQLH May 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants