Skip to content
/ nexa-sdk Public
forked from NexaAI/nexa-sdk

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

License

Notifications You must be signed in to change notification settings

eaie/nexa-sdk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), and text-to-speech (TTS) capabilities. Additionally, it offers an OpenAI-compatible API server with JSON schema mode for function calling and streaming support, and a user-friendly Streamlit UI.

Features

  • Model Support:

    • ONNX & GGML models
    • Conversion Engine
    • Inference Engine:
      • Text Generation
      • Image Generation
      • Vision-Language Models (VLM)
      • Text-to-Speech (TTS)

Detailed API documentation is available here.

  • Server:
    • OpenAI-compatible API
    • JSON schema mode for function calling
    • Streaming support
  • Streamlit UI for interactive model deployment and testing

Below is our differentiation from other similar tools:

Feature Nexa SDK ollama Optimum LM Studio
GGML Support ✅ ✅ ❌ ✅
ONNX Support ✅ ❌ ✅ ❌
Text Generation âś… âś… âś… âś…
Image Generation ✅ ❌ ❌ ❌
Vision-Language Models âś… âś… âś… âś…
Text-to-Speech ✅ ❌ ✅ ❌
Server Capability âś… âś… âś… âś…
User Interface ✅ ❌ ❌ ✅

Installation

We have released pre-built wheels for various Python versions, platforms, and backends for convenient installation on our index page.

CPU

pip install nexaai --index-url https://nexaai.github.io/nexa-sdk/whl/cpu --extra-index-url https://pypi.org/simple

GPU (Metal)

For the GPU version supporting Metal (macOS):

CMAKE_ARGS="-DGGML_METAL=ON -DSD_METAL=ON" pip install nexaai --index-url https://nexaai.github.io/nexa-sdk/whl/metal --extra-index-url https://pypi.org/simple

GPU (CUDA)

For the GPU version supporting CUDA (Linux/Windows):

CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" pip install nexaai --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple

Note

The CUDA wheels are built with CUDA 12.4, but should be compatible with all CUDA 12.X

FAQ: Building Issues for llava

If you encounter the following issue while building:

try the following command:

CMAKE_ARGS="-DCMAKE_CXX_FLAGS=-fopenmp" pip install nexaai

Docker Usage

Note: Docker doesn't support GPU acceleration

docker pull nexa4ai/sdk:latest

replace following placeholder with your path and command

docker run -v <your_model_dir>:/model -it nexa4ai/sdk:latest [nexa_command] [your_model_relative_path]

Example:

docker run -v /home/ubuntu/.cache/nexa/hub/official:/model -it nexa4ai/sdk:latest nexa gen-text /model/Phi-3-mini-128k-instruct/q4_0.gguf

will create an interactive session with text generation

Supported Models

Model Type Format Command
octopus-v2 NLP GGUF nexa run octopus-v2
octopus-v4 NLP GGUF nexa run octopus-v4
tinyllama NLP GGUF nexa run tinyllama
llama2 NLP GGUF/ONNX nexa run llama2
llama3 NLP GGUF/ONNX nexa run llama3
llama3.1 NLP GGUF/ONNX nexa run llama3.1
gemma NLP GGUF/ONNX nexa run gemma
gemma2 NLP GGUF nexa run gemma2
qwen1.5 NLP GGUF nexa run qwen1.5
qwen2 NLP GGUF/ONNX nexa run qwen2
mistral NLP GGUF/ONNX nexa run mistral
codegemma NLP GGUF nexa run codegemma
codellama NLP GGUF nexa run codellama
codeqwen NLP GGUF nexa run codeqwen
deepseek-coder NLP GGUF nexa run deepseek-coder
dolphin-mistral NLP GGUF nexa run dolphin-mistral
phi2 NLP GGUF nexa run phi2
phi3 NLP GGUF/ONNX nexa run phi3
llama2-uncensored NLP GGUF nexa run llama2-uncensored
llama3-uncensored NLP GGUF nexa run llama3-uncensored
llama2-function-calling NLP GGUF nexa run llama2-function-calling
nanollava Multimodal GGUF nexa run nanollava
llava-phi3 Multimodal GGUF nexa run llava-phi3
llava-llama3 Multimodal GGUF nexa run llava-llama3
llava1.6-mistral Multimodal GGUF nexa run llava1.6-mistral
llava1.6-vicuna Multimodal GGUF nexa run llava1.6-vicuna
stable-diffusion-v1-4 Computer Vision GGUF nexa run sd1-4
stable-diffusion-v1-5 Computer Vision GGUF/ONNX nexa run sd1-5
lcm-dreamshaper Computer Vision GGUF/ONNX nexa run lcm-dreamshaper
hassaku-lcm Computer Vision GGUF nexa run hassaku-lcm
anything-lcm Computer Vision GGUF nexa run anything-lcm
faster-whisper-tiny Audio BIN nexa run faster-whisper-tiny
faster-whisper-small Audio BIN nexa run faster-whisper-small
faster-whisper-medium Audio BIN nexa run faster-whisper-medium
faster-whisper-base Audio BIN nexa run faster-whisper-base
faster-whisper-large Audio BIN nexa run faster-whisper-large

CLI Reference

usage: nexa [-h] [-V] {run,onnx,server,pull,remove,clean,list,login,whoami,logout} ...

Nexa CLI tool for handling various model operations.

positional arguments:
  {run,onnx,server,pull,remove,clean,list,login,whoami,logout}
                        sub-command help
    run                 Run inference for various tasks using GGUF models.
    onnx                Run inference for various tasks using ONNX models.
    server              Run the Nexa AI Text Generation Service
    pull                Pull a model from official or hub.
    remove              Remove a model from local machine.
    clean               Clean up all model files.
    list                List all models in the local machine.
    login               Login to Nexa API.
    whoami              Show current user information.
    logout              Logout from Nexa API.

options:
  -h, --help            show this help message and exit
  -V, --version         Show the version of the Nexa SDK.

List Local Models

List all models on your local computer.

nexa list

Download a Model

Download a model file to your local computer from Nexa Model Hub.

nexa pull MODEL_PATH
usage: nexa pull [-h] model_path

positional arguments:
  model_path  Path or identifier for the model in Nexa Model Hub

options:
  -h, --help  show this help message and exit

Example

nexa pull llama2

Remove a Model

Remove a model from your local computer.

nexa remove MODEL_PATH
usage: nexa remove [-h] model_path

positional arguments:
  model_path  Path or identifier for the model in Nexa Model Hub

options:
  -h, --help  show this help message and exit

Example

nexa remove llama2

Remove All Downloaded Models

Remove all downloaded models on your local computer.

nexa clean

Run a Model

Run a model on your local computer. If the model file is not yet downloaded, it will be automatically fetched first.

By default, nexa will run gguf models. To run onnx models, use nexa onnx MODEL_PATH

Run Text-Generation Model

nexa run MODEL_PATH
usage: nexa run [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path

positional arguments:
  model_path            Path or identifier for the model in Nexa Model Hub

options:
  -h, --help            show this help message and exit
  -pf, --profiling      Enable profiling logs for the inference process
  -st, --streamlit      Run the inference in Streamlit UI

Text generation options:
  -t, --temperature TEMPERATURE
                        Temperature for sampling
  -m, --max_new_tokens MAX_NEW_TOKENS
                        Maximum number of new tokens to generate
  -k, --top_k TOP_K     Top-k sampling parameter
  -p, --top_p TOP_P     Top-p sampling parameter
  -sw, --stop_words [STOP_WORDS ...]
                        List of stop words for early stopping
Example
nexa run llama2

Run Image-Generation Model

nexa run MODEL_PATH
usage: nexa run [-h] [-i2i] [-ns NUM_INFERENCE_STEPS] [-np NUM_IMAGES_PER_PROMPT] [-H HEIGHT] [-W WIDTH] [-g GUIDANCE_SCALE] [-o OUTPUT] [-s RANDOM_SEED] [-st] model_path

positional arguments:
  model_path            Path or identifier for the model in Nexa Model Hub

options:
  -h, --help            show this help message and exit
  -st, --streamlit      Run the inference in Streamlit UI

Image generation options:
  -i2i, --img2img       Whether to run image-to-image generation
  -ns, --num_inference_steps NUM_INFERENCE_STEPS
                        Number of inference steps
  -np, --num_images_per_prompt NUM_IMAGES_PER_PROMPT
                        Number of images to generate per prompt
  -H, --height HEIGHT   Height of the output image
  -W, --width WIDTH     Width of the output image
  -g, --guidance_scale GUIDANCE_SCALE
                        Guidance scale for diffusion
  -o, --output OUTPUT   Output path for the generated image
  -s, --random_seed RANDOM_SEED
                        Random seed for image generation
  --lora_dir LORA_DIR   Path to directory containing LoRA files
  --wtype WTYPE         Weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
  --control_net_path CONTROL_NET_PATH
                        Path to control net model
  --control_image_path CONTROL_IMAGE_PATH
                        Path to image condition for Control Net
  --control_strength CONTROL_STRENGTH
                        Strength to apply Control Net
Example
nexa run sd1-4

Run Vision-Language Model

nexa run MODEL_PATH
usage: nexa run [-h] [-t TEMPERATURE] [-m MAX_NEW_TOKENS] [-k TOP_K] [-p TOP_P] [-sw [STOP_WORDS ...]] [-pf] [-st] model_path

positional arguments:
  model_path            Path or identifier for the model in Nexa Model Hub

options:
  -h, --help            show this help message and exit
  -pf, --profiling      Enable profiling logs for the inference process
  -st, --streamlit      Run the inference in Streamlit UI

VLM generation options:
  -t, --temperature TEMPERATURE
                        Temperature for sampling
  -m, --max_new_tokens MAX_NEW_TOKENS
                        Maximum number of new tokens to generate
  -k, --top_k TOP_K     Top-k sampling parameter
  -p, --top_p TOP_P     Top-p sampling parameter
  -sw, --stop_words [STOP_WORDS ...]
                        List of stop words for early stopping
Example
nexa run nanollava

Run Audio Model

nexa run MODEL_PATH
usage: nexa run [-h] [-o OUTPUT_DIR] [-b BEAM_SIZE] [-l LANGUAGE] [--task TASK] [-t TEMPERATURE] [-c COMPUTE_TYPE] [-st] model_path

positional arguments:
  model_path            Path or identifier for the model in Nexa Model Hub

options:
  -h, --help            show this help message and exit
  -st, --streamlit      Run the inference in Streamlit UI

Automatic Speech Recognition options:
  -b, --beam_size BEAM_SIZE
                        Beam size to use for transcription
  -l, --language LANGUAGE
                        The language spoken in the audio. It should be a language code such as 'en' or 'fr'.
  --task TASK           Task to execute (transcribe or translate)
  -c, --compute_type COMPUTE_TYPE
                        Type to use for computation (e.g., float16, int8, int8_float16)
Example
nexa run faster-whisper-tiny

Start Local Server

Start a local server using models on your local computer.

nexa server MODEL_PATH
usage: nexa server [-h] [--host HOST] [--port PORT] [--reload] model_path

positional arguments:
  model_path   Path or identifier for the model in S3

options:
  -h, --help   show this help message and exit
  --host HOST  Host to bind the server to
  --port PORT  Port to bind the server to
  --reload     Enable automatic reloading on code changes

Example

nexa server llama2

Model Path Format

For model_path in nexa commands, it's better to follow the standard format to ensure correct model loading and execution. The standard format for model_path is:

  • [user_name]/[repo_name]:[tag_name] (user's model)
  • [repo_name]:[tag_name] (official model)

Examples:

  • gemma-2b:q4_0
  • Meta-Llama-3-8B-Instruct:onnx-cpu-int8
  • alanzhuly/Qwen2-1B-Instruct:q4_0

Start Local Server

You can start a local server using models on your local computer with the nexa server command. Here's the usage syntax:

usage: nexa server [-h] [--host HOST] [--port PORT] [--reload] model_path

Options:

  • --host: Host to bind the server to
  • --port: Port to bind the server to
  • --reload: Enable automatic reloading on code changes

Example Commands:

nexa server gemma
nexa server llama2-function-calling
nexa server sd1-5
nexa server faster-whipser-large

By default, nexa server will run gguf models. To run onnx models, simply add onnx after nexa server.

API Endpoints

1. Text Generation: /v1/completions Generates text based on a single prompt.

Request body:

{
  "prompt": "Tell me a story",
  "temperature": 1,
  "max_new_tokens": 128,
  "top_k": 50,
  "top_p": 1,
  "stop_words": ["string"]
}

Example Response:

{
  "result": "Once upon a time, in a small village nestled among rolling hills..."
}
2. Chat Completions: /v1/chat/completions

Handles chat completions with support for conversation history.

Request body:

{
  "messages": [
    {
      "role": "user",
      "content": "Tell me a story"
    }
  ],
  "max_tokens": 128,
  "temperature": 0.1,
  "stream": false,
  "stop_words": []
}

Example Response:

{
  "id": "f83502df-7f5a-4825-a922-f5cece4081de",
  "object": "chat.completion",
  "created": 1723441724.914671,
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "In the heart of a mystical forest..."
      }
    }
  ]
}
3. Function Calling: /v1/function-calling

Call the most appropriate function based on user's prompt.

Request body:

{
  "messages": [
    {
      "role": "user",
      "content": "Extract Jason is 25 years old"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "UserDetail",
        "parameters": {
          "properties": {
            "name": {
              "description": "The user's name",
              "type": "string"
            },
            "age": {
              "description": "The user's age",
              "type": "integer"
            }
          },
          "required": ["name", "age"],
          "type": "object"
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Function format:

{
  "type": "function",
  "function": {
    "name": "function_name",
    "description": "function_description",
    "parameters": {
      "type": "object",
      "properties": {
        "property_name": {
          "type": "string | number | boolean | object | array",
          "description": "string"
        }
      },
      "required": ["array_of_required_property_names"]
    }
  }
}

Example Response:

{
  "id": "chatcmpl-7a9b0dfb-878f-4f75-8dc7-24177081c1d0",
  "object": "chat.completion",
  "created": 1724186442,
  "model": "/home/ubuntu/.cache/nexa/hub/official/Llama2-7b-function-calling/q3_K_M.gguf",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "logprobs": null,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call__0_UserDetail_cmpl-8d5cf645-7f35-4af2-a554-2ccea1a67bdd",
            "type": "function",
            "function": {
              "name": "UserDetail",
              "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
            }
          }
        ],
        "function_call": {
          "name": "",
          "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
        }
      }
    }
  ],
  "usage": {
    "completion_tokens": 15,
    "prompt_tokens": 316,
    "total_tokens": 331
  }
}
4. Text-to-Image: /v1/txt2img

Generates images based on a single prompt.

Request body:

{
  "prompt": "A girl, standing in a field of flowers, vivid",
  "image_path": "",
  "cfg_scale": 7,
  "width": 256,
  "height": 256,
  "sample_steps": 20,
  "seed": 0,
  "negative_prompt": ""
}

Example Response:

{
  "created": 1724186615.5426757,
  "data": [
    {
      "base64": "base64_of_generated_image",
      "url": "path/to/generated_image"
    }
  ]
}
5. Image-to-Image: /v1/img2img

Modifies existing images based on a single prompt.

Request body:

{
  "prompt": "A girl, standing in a field of flowers, vivid",
  "image_path": "path/to/image",
  "cfg_scale": 7,
  "width": 256,
  "height": 256,
  "sample_steps": 20,
  "seed": 0,
  "negative_prompt": ""
}

Example Response:

{
  "created": 1724186615.5426757,
  "data": [
    {
      "base64": "base64_of_generated_image",
      "url": "path/to/generated_image"
    }
  ]
}
6. Audio Transcriptions: /v1/audio/transcriptions

Transcribes audio files to text.

Parameters:

  • beam_size (integer): Beam size for transcription (default: 5)
  • language (string): Language code (e.g., 'en', 'fr')
  • temperature (number): Temperature for sampling (default: 0)

Request body:

{
  "file" (form-data): The audio file to transcribe (required)
}

Example Response:

{
  "text": " And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country."
}
7. Audio Translations: /v1/audio/translations

Translates audio files to text in English.

Parameters:

  • beam_size (integer): Beam size for transcription (default: 5)
  • temperature (number): Temperature for sampling (default: 0)

Request body:

{
  "file" (form-data): The audio file to transcribe (required)
}

Example Response:

{
  "text": " Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday"
}

Acknowledgements

We would like to thank the following projects:

About

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 94.7%
  • Swift 4.0%
  • Other 1.3%