Skip to content

Anchor links changed? #2002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
To1ne opened this issue May 7, 2025 · 9 comments
Open

Anchor links changed? #2002

To1ne opened this issue May 7, 2025 · 9 comments

Comments

@To1ne
Copy link
Collaborator

To1ne commented May 7, 2025

I ran across a link on the interwebs:

https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---recurse-submodulesltpathspecgt

and I noticed it's not anchoring to the correct piece of the page. The nowadays seems to be:

https://git-scm.com/docs/git-clone#Documentation/git-clone.txt-code--recurse-submodulesltpathspecgtcode

What changed the code to be added around the anchor link? I'd prefer to not have this behavior broken.

@To1ne
Copy link
Collaborator Author

To1ne commented May 7, 2025

I might look into this myself later, but I had to write this down and do something else first.

@dscho
Copy link
Member

dscho commented May 7, 2025

Oh no! I highly suspect the culprit to be related to the recent style changes (@jnavila does this sound plausible to you?)

@To1ne
Copy link
Collaborator Author

To1ne commented May 7, 2025

Oh no! I highly suspect the culprit to be related to the recent style changes

I hope it's recent and not broken since the Rails to Hugo transition, otherwise I expect both types of permalinks to be around on the interwebs. So that means we'd have to figure out a way to support both.

@dscho Shall/Can we add test coverage for these permalinks?

@jnavila
Copy link
Contributor

jnavila commented May 7, 2025

Oh no! I highly suspect the culprit to be related to the recent style changes (@jnavila does this sound plausible to you?)

I suspect so. All the command options are now formatted as <code>, and it seems this bleeds into the automatic anchor generation... Quite an inexpected side-effect. We are never sure that the machine generated anchor name is reproducible.

@jnavila
Copy link
Contributor

jnavila commented May 7, 2025

the generative part is here:

# HTML anchor on hdlist1 (i.e. command options)
html.gsub!(/<dt class="hdlist1">(.*?)<\/dt>/) do |_m|
text = $1.tr("^A-Za-z0-9-", "")
# use txt_path for backward compatibility of already-existing
# deep links shared across the interwebs.
anchor = "#{txt_path}-#{text}"
# handle anchor collisions by appending -1
anchor += "-1" while ids.include?(anchor)

And indeed it works on final html and does not select the "text" part of the html node.

@jnavila
Copy link
Contributor

jnavila commented May 7, 2025

We may have similar issues where I added "<placehoder>" marks. And it will have even a larger change effect after applying the new style formatting of #1921

@dscho
Copy link
Member

dscho commented May 7, 2025

I hope it's recent and not broken since the Rails to Hugo transition

It is recent: https://web.archive.org/web/20250113050613/https://git-scm.com/docs/git-clone#Documentation/git-clone.txt-code-lcode has the old-style links, and is well after the transition.

Shall/Can we add test coverage for these permalinks?

Absolutely! It's relatively easy in the Playwright tests, too, all you do is to call await page.goto(...) with, say, the git-clone page and then verify the link, like this:

// the repository URL is correct
await expect(page.getByRole('link', { name: 'hosted on GitHub' }))
.toHaveAttribute('href', 'https://github.com/progit/progit2')

You probably need to use a different way to locate the link, as its text is not "--local", instead the element following the link has this text, possibly something like this:

// The `<a>` element of class `anchor` directly preceding the `<code>` element whose text is "--local"
const anchor = page.locator('xpath=//code[text()="--local"]/preceding-sibling::a[@class="anchor"][1]')
await expect(anchor).toHaveAttribute('href', /#Documentation\/git-clone\.txt-l$/)

My XPath is not very good, so I'll refrain from refining it: It should probably catch instances where the "--local" text is not enclosed in a tag, too.

it works on final html and does not select the "text" part of the html node.

Yes, and that's most likely really wrong. I guess the easiest way to fix this would not be to switch from html to text, but to do something like this:

-        text = $1.tr("^A-Za-z0-9-", "")
+        text = $1.gsub(/<[^>]*>/, "").tr("^A-Za-z0-9-", "")

This would also proactively address the <placeholder> issue, at the expense of breaking potentially a lot of those <tt> links that are already in wide use.

I guess we could also do this instead:

-        text = $1.tr("^A-Za-z0-9-", "")
+        text = $1.gsub(/<\/?(?:code|placeholder)>/, "").tr("^A-Za-z0-9-", "")

Thoughts?

@jnavila
Copy link
Contributor

jnavila commented May 7, 2025

I was more thinking of settling on a true html parsing such as:

html.gsub!(/<dt class="hdlist1">(.*?)<\/dt>/) do |m| 
  sp = Nokogiri::HTML.parse(m)
  sp.text.tr("^A-Za-z0-9-", "")
end

Even with HTML5 now, we are not limited in the form of ids and do not need the last tr command.

@dscho
Copy link
Member

dscho commented May 8, 2025

@jnavila we're still limited by backwards-compatibility: Previously-working links using anchors should continue to work. That's why I would highly suggest to stick with parsing text rather than html, and to retain the tr() call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants