Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define local-words and local-language #20

Open
minad opened this issue Nov 15, 2024 · 7 comments
Open

Define local-words and local-language #20

minad opened this issue Nov 15, 2024 · 7 comments

Comments

@minad
Copy link

minad commented Nov 15, 2024

Hi Augusto,

I wonder if we should make the push and define local-words and local-language as a standard for spell checkers, only if we find a common ground of course. We could add support in Jit-spell and in Jinx, and open a feature request (or even send patches) on the Emacs bug tracker. I suggest to use strings since this will lead to a compact representation in the files:

(defcustom local-language "de_DE en_US" ;; Or local-languages?
  "File local languages, space separated."
  :type 'string
  :local t
  :safe #'stringp)

(defcustom local-words "some words separated by space"
  "File local words, space separated."
  :type 'string
  :local t
  :safe #'stringp)

There is of course a possibility to paint the bike in different colors. Since there is already LocalWords: I think local-words is the most obvious upgrade to file-local variables. As alternative names we could consider spell-words and spell-language, but these may restrict the purpose of these variables too much.

@astoff
Copy link
Owner

astoff commented Nov 16, 2024

I like the idea. I think it solves some real problems.

As to the bike color: I think spell-language and spell-words sounds much more self-explanatory. What other purpose do you anticipate for those vars that might make that name seem out of place?

Also, I think I kind of see the purpose of making {local/spell}-words a space separated string. I haven't thought about the consequences. But for {local/spell}-language, my first impulse would be to make it of type "symbol or list of symbols". Did you consider that?

@minad
Copy link
Author

minad commented Nov 16, 2024

As to the bike color: I think spell-language and spell-words sounds much more self-explanatory. What other purpose do you anticipate for those vars that might make that name seem out of place?

For local-words I don't see much other purposes, but for local-language there certainly are other purposes. Think about exporting with metadata, e.g., html can specify the language or Org export uses the language. So to me it seems there is a value beyond spell checking in the ability to specify that the given file has this or these languages.

But for {local/spell}-language, my first impulse would be to make it of type "symbol or list of symbols". Did you consider that?

Symbols would work too I guess. Emacs already comes with a bunch of language variables and it doesn't seem very consistent. Some are strings (current-language-environment, local-languages-names, org-latex-babel-language-alist), some are symbols (current-iso639-language). I think I slightly prefer strings, since they have more of a free-form nature, and since there won't be distinct types for single and multiple languages. But then language identifiers should be quite uniform so symbols would work.

In any case, I would desperately want to avoid to use a list for local-words. If the end result were that we end up with a list of strings I would very much dislike that for its verbosity and ugliness. Note that we want to end up something which is more or less competitive with the existing LocalWords mechanism, so if the result will be far less readable it will not work out. That's probably also why I prefer local-language to be a string, only such that both variables are similar in their type, and to avoid such a debate in the first place.

@astoff
Copy link
Owner

astoff commented Nov 16, 2024

Think about exporting with metadata, e.g., html can specify the language or Org export uses the language.

Okay, but I think it's a hard to arrive at some sound specification at such a level of generality. For example, I don't think HTML allows multiple languages in the lang attribute, while for spelling it makes sense. (Also note that strictly speaking, spell-checking is about dictionaries; e.g., you could have an English dictionary of medical terms to use combined with the general English dictionary. I doubt this feature is widely used, though).

Also, even if such a general document-language was defined, it might still make sense to have a spell-language override, i.e., the spellchecker would use (or spell-language document-language) to decide what do to.

I think I slightly prefer strings, since they have more of a free-form nature, and since there won't be distinct types for single and multiple languages.

I guess that was my point, the language identifiers are more or less fixed, no? There is perhaps about a hundred usual ones, a few thousands if one counts rare languages.

In any case, I would desperately want to avoid to use a list for local-words.

Fair enough. Does it mean it has to fit in one long line? Do you mean ?\s-separated or whitespace-separated, allowing ?\n as well?


Now, those were the bikesheddy comments :-) . The point I really want to push is, I dislike the local-language name, now that I think about it. The reason is, while one might typically leave that variable globally as nil (meaning to use the system's locale language), it should be allowed to set the language to some value globally and override in project- or buffer-locally. And if one does so, local-language is an awkward name. Put differently, people might be confused and ask why is global-language not a variable.

@minad
Copy link
Author

minad commented Nov 17, 2024

Okay, but I think it's a hard to arrive at some sound specification at such a level of generality.

I think if one specifies that languages must be a language code, then this is already restrictive enough to work for multiple purposes, e.g., codes of the form de_DE.something.

Also, even if such a general document-language was defined, it might still make sense to have a spell-language override, i.e., the spellchecker would use (or spell-language document-language) to decide what do to.

That's probably right.

I guess that was my point, the language identifiers are more or less fixed, no? There is perhaps about a hundred usual ones, a few thousands if one counts rare languages.

There are rare ones, if one uses custom dictionaries.

Fair enough. Does it mean it has to fit in one long line? Do you mean ?\s-separated or whitespace-separated, allowing ?\n as well?

I meant whitespace in general, including newlines etc. split-string with default arguments would work as normalization.

Now, those were the bikesheddy comments :-) . The point I really want to push is, I dislike the local-language name, now that I think about it. The reason is, while one might typically leave that variable globally as nil (meaning to use the system's locale language), it should be allowed to set the language to some value globally and override in project- or buffer-locally. And if one does so, local-language is an awkward name. Put differently, people might be confused and ask why is global-language not a variable.

There already exist global language environment variables. But now that you've mentioned the prefix document-*, I would like that more. It could be set globally (as default for all documents) or file-locally.

So we either use document-language/document-words or spell-language/spell-words. Even if we use the less generic spell-language, the variable could still be abused for other purposes and configure other language-dependent packages to rely on the variable, so in the end it doesn't matter much.

@astoff
Copy link
Owner

astoff commented Nov 17, 2024

All right, I think that sounds convincing ;-). What's the next step you would you like to take with this proposal?

@minad
Copy link
Author

minad commented Nov 17, 2024

Either implement it here and in Jinx or propose a patch upstream to ispell.el/flyspell.el? The question is also where the variables should live, maybe in files.el?

@minad
Copy link
Author

minad commented Dec 5, 2024

I asked karthik's gptel to generate a bug report for the Emacs bug tracker from this GH discussion ;)


Feature Request: Add File-Local Language and Words Variables

Summary:

Introduce document-language and document-words as file-local variables in
Emacs. These should standardize settings for languages and custom words across
modes like ispell and flyspell, and also for document export.

Details:

  • document-language: Use a space-separated string of language codes (e.g.,
    "de_DE en_US") to specify languages associated with the file. This can aid not
    only spell-checkers but also support language specification in exports (e.g.,
    HTML, Org mode).

  • document-words: Use a space-separated string of custom words for the
    document to enhance spell-checking by capturing specific terminology.

Implementation Steps:

  1. Implement the variables, perhaps in files.el for broad accessibility.
  2. Maintain compatibility with existing systems like LocalWords.
  3. Use general whitespace (including newlines) as separators in
    document-words.

Considerations:

  • Clarify the distinction and relation between global language settings and
    these file-local ones.
  • Naming discussions are welcome, focusing on whether document- or more
    specific prefixes like spell- should be used.

Next Actions:

  1. Finalize variable names and their module locations.
  2. Develop a patch for Emacs, targeting modules like ispell.el or
    flyspell.el.
  3. Seek and incorporate community feedback for refinement.

These updates aim to improve multilingual document handling in Emacs and make
configuration more consistent across various uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants