Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
-
Updated
Dec 23, 2024 - Python
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Golang HTML to plaintext conversion library
The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.
📝 Html2Text - Convert HTML to formatted plain text, e.g. for text mails.
RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.
An extremely configurable markdown reverser for Python3.
A very simple (but efficient) "HTML to plain text" converter ✍️
Article title, authors, date and body extraction dataset.
DeepSpam milter v2
AI chat app to response data in Markdown format with text and images. Tutorial from: https://youtu.be/qKtM2AlDTs8
Python library for converting HTML to markup or plain text
Go package that cleans a HTML page for better readability.
a cli tool to fetch webpages main content and print it as markdown
This project involves building a robust classifier that classifies whether a document (from abstract content) belongs to cancer class or not.
html2text Search Command for Splunk
C'est un projet de web scraping qui utilise Streamlit, BeautifulSoup, et html2text pour extraire, convertir en Markdown, et afficher le contenu de toutes les pages liées à une URL donnée. Il fournit un sommaire interactif des URL visitées et permet d'afficher le contenu extrait dans un format facile à lire.
Add a description, image, and links to the html2text topic page so that developers can more easily learn about it.
To associate your repository with the html2text topic, visit your repo's landing page and select "manage topics."