Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(html-parser): sanitize unclosed tags in markdown rendering #14309

Merged
merged 1 commit into from
Feb 26, 2025

Conversation

WustLCQ
Copy link
Contributor

@WustLCQ WustLCQ commented Feb 25, 2025

  • Add AST traversal validation for HTML tag syntax
  • Implement regex filter (/<(\w+)[^>]*$/) for malformed tags
  • Escape invalid nested tags instead of throwing parser errors

This resolves rendering crashes caused by unclosed tags like

<span>
  0<x<b
</span>

The sanitization preserves legitimate HTML tags while converting invalid structures to plain text.

Summary

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Resolve #14356

Screenshots

Before After
image image

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

- Add AST traversal validation for HTML tag syntax
- Implement regex filter (/<(\w+)[^>]*$/) for malformed tags
- Escape invalid nested tags instead of throwing parser errors

This resolves rendering crashes caused by unclosed tags like 
```
<span>0<x<b</span>
```
The sanitization preserves legitimate HTML tags while converting invalid
structures to plain text.
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. 🐞 bug Something isn't working labels Feb 25, 2025
@crazywoola
Copy link
Member

Cool

Copy link
Member

@crazywoola crazywoola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 26, 2025
@crazywoola crazywoola merged commit d571158 into langgenius:main Feb 26, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sanitize unclosed tags in markdown rendering
2 participants