Inline asterisks are not closed properly #117

pepelsbey · 2015-07-27T15:29:23Z

Use case “make first letter of a word bold or italic” works fine both in Markdown preview in Atom and on GitHub, just like in Sublime Text. But there’s a problem in GFM language module preventing proper rendering right in the editor:

*f*oo *b*ar baz
**f**oo **b**ar baz

Atom editor
Atom Markdown preview (proper behaviour)
Sublime Text editor (proper behaviour)

Only first letters of the first two words should be bold/italic, not the whole word and especially not the rest of the text after.

The text was updated successfully, but these errors were encountered:

ootz0rz · 2015-08-04T06:57:38Z

Similar issue with underlines. Ex:

Blah blah _ blah
blah blah blah foo bar
fooooo _ oh hi

Everything in between those two "_"'s is highlighted as being italics, but it's correctly rendered such that they're not.

burodepeper · 2015-10-02T08:54:21Z

Work in progress...

edent · 2015-11-26T18:57:35Z

I'm seeing the same issue (latest .deb version)

The syntax highlighting really doesn't like it if the *s are inline.

burodepeper · 2015-11-26T19:00:03Z

@edent: try the language-markdown package and let me know if that solves your problem

edent · 2015-11-26T19:38:44Z

@burodepeper that fixes the formatting - but for some reason the preview pane won't show up!

burodepeper · 2015-11-26T19:47:50Z

@edent You mean the Markdown preview? Could you perhaps create an issue with as much relevant details as possible, then I'll have a look tomorrow.

queuedq · 2017-04-07T09:16:57Z

This phenomenon is due to 17a9412. The commit message is:

Whitespace after opening and before closing an tag is invalid ( e.g. _ text_ or _text _ ).

Before opening and after closing tags the only character accepted is anything but a word or a digit ( 2*text* and d*text* are invalid but not $*text*$ ).

The second sentence is wrong in the context of GFM spec. I have no idea why this commit was accepted.

Using language-markdown instead seems to resolve the problem. However, since language-gfm package is a core package of Atom, it is preferred to fix the problem within this package.

I have tried to fix this issue, however I faced several difficulties:

The original spec is quite complicated to represent in regex.
1. It makes several new definitions: delimiter run, punctuation character, left(right)-flanking delimiter run.
2. Some rules including the 9th rule are difficult to implement.
I am not used to Atom's grammar syntax. I don't know if those rules can be implemented in some way other than just using regex match.

And this is my attempt to fix the problem:
(Note that the line numbers might differ from the current version since it is downloaded before some new commits are made. I have checked the new commits are not related to this issue.)

diff --git a/grammars/gfm.cson b/grammars/gfm.cson
index 06759a1..67d95e6 100644
--- a/grammars/gfm.cson
+++ b/grammars/gfm.cson
@@ -16,8 +16,8 @@
     'name': 'constant.character.escape.gfm'
   }
   {
-    'begin': '(?<=^|[^\\w\\d\\*])\\*\\*\\*(?!$|\\*|\\s)'
-    'end': '(?<!^|\\s)\\*\\*\\**\\*(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\*)\\*{3}(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*{3}(?!$|\\s|\\*)'
+    'end': '(?<=\\w)(?<!\\*)\\*{3}(?!\\*)|(?<!^|\\s|\\*)\\*{3}(?!\\w|\\*)'
     'name': 'markup.bold.italic.gfm'
     'patterns': [
       {
@@ -32,8 +32,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d_])___(?!$|_|\\s)'
-    'end': '(?<!^|\\s)___*_(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\w|_)___(?!$|\\s|_)'
+    'end': '(?<!^|\\s|_)___(?!\\w|_)'
     'name': 'markup.bold.italic.gfm'
     'patterns': [
       {
@@ -48,8 +48,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d\\*])\\*\\*(?!$|\\*|\\s)'
-    'end': '(?<!^|\\s)\\*\\**\\*(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\*)\\*\\*(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*\\*(?!$|\\s|\\*)'
+    'end': '(?<=\\w)(?<!\\*)\\*\\*(?!\\*)|(?<!^|\\s|\\*)\\*\\*(?!\\w|\\*)'
     'name': 'markup.bold.gfm'
     'patterns': [
       {
@@ -64,8 +64,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d_])__(?!$|_|\\s)'
-    'end': '(?<!^|\\s)__*_(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\w|_)__(?!$|\\s|_)'
+    'end': '(?<!^|\\s|_)__(?!\\w|_)'
     'name': 'markup.bold.gfm'
     'patterns': [
       {
@@ -80,8 +80,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d\\*])\\*(?!$|\\*|\\s)'
-    'end': '(?<!^|\\s)\\**\\*(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\*)\\*(?!\\*)(?=\\w)|(?<!\\w|\\*)\\*(?!$|\\s|\\*)'
+    'end': '(?<=\\w)(?<!\\*)\\*(?!\\*)|(?<!^|\\s|\\*)\\*(?!\\w|\\*)'
     'name': 'markup.italic.gfm'
     'patterns': [
       {
@@ -96,8 +96,8 @@
     ]
   }
   {
-    'begin': '(?<=^|[^\\w\\d_\\{\\}])_(?!$|_|\\s)'
-    'end': '(?<!^|\\s)_*_(?=$|[^\\w|\\d])'
+    'begin': '(?<!\\w|_)_(?!$|\\s|_)'
+    'end': '(?<!^|\\s|_)_(?!\\w|_)'
     'name': 'markup.italic.gfm'
     'patterns': [
       {

Though this change can fix the issue, it is never a good solution. I ignored the characters that are none of unicode whitespaces, punctuation characters, or regex word characters. I compacted the rules into poorly readable regex expression. Not all of the rules are applied to them, and even I reinterpreted some rules to bring them into regex expression.

I hope this issue to be fixed soon. It gives much inconvenience for the users.

EDIT: FYI, you can test your markdown syntax in the commonmark.js dingus.

winstliu · 2017-04-07T14:47:44Z

I have no idea why this commit was accepted.

Probably because the spec didn't exist in 2014 😉.

However, I do agree that this needs to be fixed, and that language-gfm is not in a very good state at the moment. I cannot give you an ETA when I personally will be able to investigate this issue given all the other language issues that are open.

miller-time · 2019-05-22T19:35:04Z

I'm encountering this bug, and I was curious what it might take to fix it.

my example:

* *note: this emphasized bullet has one **bold** word in it*

(only *note: this emphasized bullet has one **bold* is colored with italicized formatting, **bold** is not bold)

I'm pretty sure the problem is that regex cannot be used to parse nested structures (asterisk regions within asterisk regions).

I'm not very familiar with Atom development, but after reading the language grammars docs, it sounds like the existing language-gfm grammar is a TextMate ("legacy") implementation. Has upgrading to tree-sitter been considered?

lee-dohm mentioned this issue Feb 11, 2016

Syntax highlight doesn't mark bold properly after/before alnum characters #138

Closed

winstliu mentioned this issue Feb 29, 2016

Bold or Italic syntax highlighting works incorrectly with CJK words. #143

Closed

rsese mentioned this issue Mar 28, 2017

Formatting Issue with Bold Text in Editor using Markdown atom/atom#14075

Closed

1 task

winstliu added the bug label Apr 7, 2017

rsese mentioned this issue May 6, 2017

Syntax highlighting errors when letter **B**olding #203

Closed

winstliu mentioned this issue May 22, 2017

wrong syntax highlighted for blod #209

Closed

1 task

rsese mentioned this issue Nov 8, 2017

Bad syntax highlighting when character in word is wrapped in italic or bold tags #217

Closed

rsese mentioned this issue Dec 2, 2017

bold syntax overflow #220

Closed

1 task

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline asterisks are not closed properly #117

Inline asterisks are not closed properly #117

pepelsbey commented Jul 27, 2015

ootz0rz commented Aug 4, 2015

burodepeper commented Oct 2, 2015

edent commented Nov 26, 2015

burodepeper commented Nov 26, 2015

edent commented Nov 26, 2015

burodepeper commented Nov 26, 2015

queuedq commented Apr 7, 2017 •

edited

Loading

winstliu commented Apr 7, 2017

This comment has been minimized.

miller-time commented May 22, 2019

Inline asterisks are not closed properly #117

Inline asterisks are not closed properly #117

Comments

pepelsbey commented Jul 27, 2015

ootz0rz commented Aug 4, 2015

burodepeper commented Oct 2, 2015

edent commented Nov 26, 2015

burodepeper commented Nov 26, 2015

edent commented Nov 26, 2015

burodepeper commented Nov 26, 2015

queuedq commented Apr 7, 2017 • edited Loading

winstliu commented Apr 7, 2017

This comment has been minimized.

miller-time commented May 22, 2019

queuedq commented Apr 7, 2017 •

edited

Loading