Tags: ray-project/llmperf
Tags
LLMPerfV2 (#19) * LLMPerfV2 The latest version of LLMPerf brings a suite of significant updates designed to provide more in-depth and customizable benchmarking capabilities for LLM inference. These updates include: - Expanded metrics with quantile distribution (P25-99): Comprehensive data representation for deeper insights. - Customizable benchmarking parameters: Tailor parameters to fit specific use case scenarios. - Introduction of load test and correctness test: Assessing performance and accuracy under stress. - Broad compatibility: Supports a range of products including [Anyscale Endpoints](https://www.anyscale.com/endpoints), [OpenAI](https://openai.com/blog/openai-api), [Anthropic](https://docs.anthropic.com/claude/reference/getting-started-with-the-api), [together.ai](http://together.ai/), [Fireworks.ai](https://app.fireworks.ai/), [Perplexity](https://www.perplexity.ai/), [Huggingface](https://huggingface.co/inference-endpoints), [Lepton AI](https://www.lepton.ai/docs/overview/model_apis), and various APIs supported by the [LiteLLM project](https://litellm.ai/)). - Easy addition of new LLMs via the LLMClient API. Signed-off-by: Avnish Narayan <[email protected]>