Starred repositories
LlamaIndex is the leading framework for building LLM-powered agents over your data.
仅需Python基础,从0构建大语言模型;从0逐步构建GLM4\Llama3\RWKV6, 深入理解大模型原理
pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models
A collection of Knowledge Tracing model implementations with PyTorch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
This repository contains a reading list of papers on Time Series Forecasting/Prediction (TSF) and Spatio-Temporal Forecasting/Prediction (STF). These papers are mainly categorized according to the …
Word, Excel, and PowerPoint plugin for QuickLook.
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
"Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models?"
A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux and Web
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
MTEB: Massive Text Embedding Benchmark
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
ESWA - Expert Systems with Applications latex template
Lime: Explaining the predictions of any machine learning classifier
A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning
Code for the paper "CollaborEM: A Self-supervised Entity Matching Framework Using Multi-features Collaboration". TKDE 2021.
Implementation of the paper "Deep Indexed Active Learning for Matching Heterogeneous Entity Representations"
Code for the paper "PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching". VLDB 2023.
wbsg-uni-mannheim / winter
Forked from olehmberg/winterWInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and resu…
A collection of awesome resources regarding Record Linkage.
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples
The dataset for the paper "Machamp: A Generalized Entity Matching Benchmark" published in CIKM 2021