Stars
allRank is a framework for training learning-to-rank neural models based on PyTorch.
The official implementation of the paper "AgentSquare: Automatic LLM Agent Search in Modular Design Space""
vLLM for embedding tasks using Original LLMs (Qwen2, LLaMA)
Translate PDF, EPub, webpage, metadata, annotations, notes to the target language. Support 20+ translate services.
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
A high-throughput and memory-efficient inference and serving engine for LLMs
A PyTorch implementation of the Transformer model in "Attention is All You Need".