InferenceNexus
Popular repositories Loading
-
text-generation-inference
text-generation-inference PublicForked from huggingface/text-generation-inference
Large Language Model Text Generation Inference
Python 1
-
-
ipex-llm
ipex-llm PublicForked from intel/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc,…
Python 1
-
litellm
litellm PublicForked from BerriAI/litellm
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
Python 1
-
litgpt
litgpt PublicForked from Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Python 1
-
inference-benchmarker
inference-benchmarker PublicForked from huggingface/inference-benchmarker
Inference server benchmarking tool
Rust 1
Repositories
- triton-server Public Forked from triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
InferenceNexus/triton-server’s past year of commit activity - litellm Public Forked from BerriAI/litellm
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
InferenceNexus/litellm’s past year of commit activity - FastChat Public Forked from lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
InferenceNexus/FastChat’s past year of commit activity - ncnn Public Forked from Tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
InferenceNexus/ncnn’s past year of commit activity - onnxruntime Public Forked from microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
InferenceNexus/onnxruntime’s past year of commit activity - optimum-intel Public Forked from huggingface/optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
InferenceNexus/optimum-intel’s past year of commit activity - ipex-llm Public Forked from intel/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
InferenceNexus/ipex-llm’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…