Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📘 [Wordlist] Add a wiktionary crawler #167

Closed
camarm-dev opened this issue Jun 9, 2024 · 2 comments · Fixed by #169
Closed

📘 [Wordlist] Add a wiktionary crawler #167

camarm-dev opened this issue Jun 9, 2024 · 2 comments · Fixed by #169
Labels
word About definition and word adding.

Comments

@camarm-dev
Copy link
Owner

Crawl the Wiktionary for all word pages and add words and their phonetic to wordlist:

  1. Crawl a word
  2. Is the word already in the wordlist (and abort) ?
  3. Add to a temporary file, so scripts/add_word.py can perform a fast insertion to the Remède database
@camarm-dev camarm-dev added the word About definition and word adding. label Jun 9, 2024
@camarm-dev
Copy link
Owner Author

This is not a good idea: many pages are not usefull for a dictionary like Remède... maybe find another wordlist

@camarm-dev
Copy link
Owner Author

camarm-dev commented Jun 9, 2024

Test with https://github.com/lorenbrichter/Words/blob/master/Words/fr.txt

  • Iterate all words, check if it is in wordlist
  • Get phoneme with wiktionary
  • Add to file

camarm-dev added a commit that referenced this issue Jun 15, 2024
Version 1.2.0 with patches for #166, #167, #168 and #170
@camarm-dev camarm-dev moved this to Ready in Remède ROADMAP Jun 28, 2024
@camarm-dev camarm-dev moved this from Ready to Done in Remède ROADMAP Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
word About definition and word adding.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant