Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Inline asterisks are not closed properly #117

Open
pepelsbey opened this issue Jul 27, 2015 · 10 comments
Open

Inline asterisks are not closed properly #117

pepelsbey opened this issue Jul 27, 2015 · 10 comments
Labels

Comments

@pepelsbey
Copy link

Use case “make first letter of a word bold or italic” works fine both in Markdown preview in Atom and on GitHub, just like in Sublime Text. But there’s a problem in GFM language module preventing proper rendering right in the editor:

*f*oo *b*ar baz
**f**oo **b**ar baz
  1. Atom editor
  2. Atom Markdown preview (proper behaviour)
  3. Sublime Text editor (proper behaviour)

md

Only first letters of the first two words should be bold/italic, not the whole word and especially not the rest of the text after.

@ootz0rz
Copy link

ootz0rz commented Aug 4, 2015

Similar issue with underlines. Ex:

Blah blah _ blah
blah blah blah foo bar
fooooo _ oh hi

Everything in between those two "_"'s is highlighted as being italics, but it's correctly rendered such that they're not.

@burodepeper
Copy link

schermafbeelding 2015-10-02 om 10 53 52

Work in progress...

@edent
Copy link

edent commented Nov 26, 2015

I'm seeing the same issue (latest .deb version)

boling

The syntax highlighting really doesn't like it if the *s are inline.

@burodepeper
Copy link

@edent: try the language-markdown package and let me know if that solves your problem

@edent
Copy link

edent commented Nov 26, 2015

@burodepeper that fixes the formatting - but for some reason the preview pane won't show up!

@burodepeper
Copy link

@edent You mean the Markdown preview? Could you perhaps create an issue with as much relevant details as possible, then I'll have a look tomorrow.

@queuedq
Copy link

queuedq commented Apr 7, 2017

This phenomenon is due to 17a9412. The commit message is:

Whitespace after opening and before closing an tag is invalid ( e.g. _ text_ or _text _ ).

Before opening and after closing tags the only character accepted is anything but a word or a digit ( 2*text* and d*text* are invalid but not $*text*$ ).

The second sentence is wrong in the context of GFM spec. I have no idea why this commit was accepted.

Using language-markdown instead seems to resolve the problem. However, since language-gfm package is a core package of Atom, it is preferred to fix the problem within this package.

I have tried to fix this issue, however I faced several difficulties:

  1. The original spec is quite complicated to represent in regex.
    1. It makes several new definitions: delimiter run, punctuation character, left(right)-flanking delimiter run.
    2. Some rules including the 9th rule are difficult to implement.
  2. I am not used to Atom's grammar syntax. I don't know if those rules can be implemented in some way other than just using regex match.

And this is my attempt to fix the problem:
(Note that the line numbers might differ from the current version since it is downloaded before some new commits are made. I have checked the new commits are not related to this issue.)

diff --git a/grammars/gfm.cson b/grammars/gfm.cson
index 06759a1..67d95e6 100644
--- a/grammars/gfm.cson
+++ b/grammars/gfm.cson
@@ -16,8 +16,8 @@
     'name': 'constant.character.escape.gfm'
   }
   {
-    'begin': '(?<=^|[^\\w\\d\\*])\\*\\*\\*(?!$|\\*|\\s)'
-    'end': '(?<!^|\\s)\\*\\*\\**\\*(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\*)\\*{3}(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*{3}(?!$|\\s|\\*)'
+    'end': '(?<=\\w)(?<!\\*)\\*{3}(?!\\*)|(?<!^|\\s|\\*)\\*{3}(?!\\w|\\*)'
     'name': 'markup.bold.italic.gfm'
     'patterns': [
       {
@@ -32,8 +32,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d_])___(?!$|_|\\s)'
-    'end': '(?<!^|\\s)___*_(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\w|_)___(?!$|\\s|_)'
+    'end': '(?<!^|\\s|_)___(?!\\w|_)'
     'name': 'markup.bold.italic.gfm'
     'patterns': [
       {
@@ -48,8 +48,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d\\*])\\*\\*(?!$|\\*|\\s)'
-    'end': '(?<!^|\\s)\\*\\**\\*(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\*)\\*\\*(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*\\*(?!$|\\s|\\*)'
+    'end': '(?<=\\w)(?<!\\*)\\*\\*(?!\\*)|(?<!^|\\s|\\*)\\*\\*(?!\\w|\\*)'
     'name': 'markup.bold.gfm'
     'patterns': [
       {
@@ -64,8 +64,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d_])__(?!$|_|\\s)'
-    'end': '(?<!^|\\s)__*_(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\w|_)__(?!$|\\s|_)'
+    'end': '(?<!^|\\s|_)__(?!\\w|_)'
     'name': 'markup.bold.gfm'
     'patterns': [
       {
@@ -80,8 +80,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d\\*])\\*(?!$|\\*|\\s)'
-    'end': '(?<!^|\\s)\\**\\*(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\*)\\*(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*(?!$|\\s|\\*)'
+    'end': '(?<=\\w)(?<!\\*)\\*(?!\\*)|(?<!^|\\s|\\*)\\*(?!\\w|\\*)'
     'name': 'markup.italic.gfm'
     'patterns': [
       {
@@ -96,8 +96,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d_\\{\\}])_(?!$|_|\\s)'
-    'end': '(?<!^|\\s)_*_(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\w|_)_(?!$|\\s|_)'
+    'end': '(?<!^|\\s|_)_(?!\\w|_)'
     'name': 'markup.italic.gfm'
     'patterns': [
       {

Though this change can fix the issue, it is never a good solution. I ignored the characters that are none of unicode whitespaces, punctuation characters, or regex word characters. I compacted the rules into poorly readable regex expression. Not all of the rules are applied to them, and even I reinterpreted some rules to bring them into regex expression.

I hope this issue to be fixed soon. It gives much inconvenience for the users.

EDIT: FYI, you can test your markdown syntax in the commonmark.js dingus.

@winstliu
Copy link
Contributor

winstliu commented Apr 7, 2017

I have no idea why this commit was accepted.

Probably because the spec didn't exist in 2014 😉.

However, I do agree that this needs to be fixed, and that language-gfm is not in a very good state at the moment. I cannot give you an ETA when I personally will be able to investigate this issue given all the other language issues that are open.

@Neonit

This comment has been minimized.

@miller-time
Copy link

I'm encountering this bug, and I was curious what it might take to fix it.

my example:

* *note: this emphasized bullet has one **bold** word in it*

(only *note: this emphasized bullet has one **bold* is colored with italicized formatting, **bold** is not bold)

I'm pretty sure the problem is that regex cannot be used to parse nested structures (asterisk regions within asterisk regions).

I'm not very familiar with Atom development, but after reading the language grammars docs, it sounds like the existing language-gfm grammar is a TextMate ("legacy") implementation. Has upgrading to tree-sitter been considered?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

8 participants