Stars
Open source free capture HTTP(S) traffic software ProxyPin, supporting full platform systems
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
a spider for tencent gongyi data
A curated list of graph-based fraud, anomaly, and outlier detection papers & resources
Source Code for 'Python Data Analytics, 2nd Edition' by Fabio Nelli
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
中文自然语言推理数据集(A large-scale Chinese Nature language inference and Semantic similarity calculation Dataset)
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
An elegent pytorch implement of transformers
A python tool for evaluating the quality of sentence embeddings.
本项目在电网内网邮箱系统使用中记录的问答数据上,设计基于知识图谱的智能问答客服系统,主要涉及到的算法为无监督文本相似度算法:simCSE。
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
experiments of some semantic matching models and comparison of experimental results.
文本相似度,语义向量,文本向量,text-similarity,similarity, sentence-similarity,BERT,SimCSE,BERT-Whitening,Sentence-BERT, PromCSE, SBERT
Tutorial notebook on SimCSE (Ja)
句子匹配模型,包括无监督的SimCSE、ESimCSE、PromptBERT,和有监督的SBERT、CoSENT。
This is a hands-on for ML beginners to perform SimCSE step-by-step. Implemented both supervised SimCSE and unsupervisied SimCSE, and distributed training is possible with Amazon SageMaker.
⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.
Code for the NAACL 2022 long paper "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings"