Study of linguistic gender biases in the overview of biographies in the English Wikipedia
A Pythonic wrapper for the Wikipedia API
MediaWiki API wrapper in python
A Python tool to pull the complete edit history of a Wikipedia page
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
PDF GPT allows you to chat with the contents of your PDF file by using GPT capabilities. The most effective open source solution to turn your pdf files in a chatbot!
Capture screenshots of websites
📸 A GitHub Action to capture screenshots of a website, across Windows, Mac, and Linux
Password protect a static HTML page, decrypted in-browser in JS with no dependency. No server logic needed.
Convert json to sql using python & sqlite3
A webpage proxy that request through Chromium (puppeteer) - can be used to bypass Cloudflare anti bot / anti ddos on any application (like curl)
TextClf :基于Pytorch/Sklearn的文本分类框架,包括逻辑回归、SVM、TextCNN、TextRNN、TextRCNN、DRNN、DPCNN、Bert等多种模型,通过简单配置即可完成数据处理、模型训练、测试等过程。
2018-DC-“达观杯”文本智能处理挑战赛:冠军 (1st/3131)
qqccmm / AutoHome_spider
Forked from StuPeter/AutoHome_spider汽车之家爬虫,解决字体反爬。
Random User-Agent middleware based on fake-useragent
qqccmm / Tieba_Spider
Forked from Aqua-Dream/Tieba_Spider百度贴吧爬虫(基于scrapy和mysql)