InferenceNexus

triton-server Public Forked from triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.

InferenceNexus/triton-server’s past year of commit activity

Python 0 BSD-3-Clause 1,550 0 0 Updated Jan 29, 2025
litgpt Public Forked from Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

InferenceNexus/litgpt’s past year of commit activity

Python 1 Apache-2.0 1,194 0 0 Updated Jan 24, 2025
litellm Public Forked from BerriAI/litellm
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

InferenceNexus/litellm’s past year of commit activity

Python 1 2,251 0 0 Updated Jan 24, 2025
FastChat Public Forked from lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

InferenceNexus/FastChat’s past year of commit activity

Python 0 Apache-2.0 4,786 0 0 Updated Jan 14, 2025
ncnn Public Forked from Tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform

InferenceNexus/ncnn’s past year of commit activity

C++ 0 4,262 0 0 Updated Jan 13, 2025
onnxruntime Public Forked from microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

InferenceNexus/onnxruntime’s past year of commit activity

C++ 0 MIT 3,130 0 0 Updated Jan 13, 2025
optimum-intel Public Forked from huggingface/optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools

InferenceNexus/optimum-intel’s past year of commit activity

Jupyter Notebook 0 Apache-2.0 128 0 0 Updated Jan 12, 2025
ipex-llm Public Forked from intel/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.

InferenceNexus/ipex-llm’s past year of commit activity

Python 1 Apache-2.0 1,361 0 0 Updated Jan 12, 2025
llama-box Public Forked from gpustack/llama-box
LM inference server implementation based on llama.cpp.

InferenceNexus/llama-box’s past year of commit activity

C++ 1 MIT 13 0 0 Updated Jan 12, 2025
mlc-llm Public Forked from mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation

InferenceNexus/mlc-llm’s past year of commit activity

Python 1 Apache-2.0 1,716 0 0 Updated Jan 12, 2025

View all repositories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InferenceNexus

Popular repositories Loading

Repositories

People

Top languages

Most used topics