Skip to content

Latest commit




LLM RAG Toolkit


Validated hardware

  • CPU: 13th generations of Intel Core processors and above
  • GPU: Intel® Arc™ graphics
  • RAM: 32GB
  • DISK: 128GB

Application ports

Please ensure that you have these ports available before running the applications.

Apps Port
UI 8010
Backend 8011
Serving 8012

Quick Start


If you are using this bundle without any finetuned model, you must follow the steps below before running the setup.

1. Install operating system

Install the latest Ubuntu* 22.04 LTS Desktop. Refer to Ubuntu Desktop installation tutorial if needed.

2. Create a Hugging Face account and generate an access token. For more information, please refer to link.

3. Login to your Hugging Face account and browse to mistralai/Mistral-7B-Instruct-v0.3 and click on the Agree and access repository button.

4. Clone repository

This step will clone the repository.

sudo apt install git
git clone

5. Go to the LLM RAG Toolkit use case directory

This step will redirect user to the current platform setup directory.

cd edge-developer-kit-reference-scripts/usecases/llm/rag-toolkit

6. Run the setup script

This step will download all the dependencies needed to run the application.


7. Start all the services

Run the script to start all the services. During the first time running, the script will download some assets required to run the services, please ensure you have internet connection.


Docker Setup


  1. Docker and docker compose should be setup before running the commands below. Refer to here to setup docker.
  2. Install necessary GPU drivers.
    • Refer to here to setup GPU drivers

1. Setup env

Set the INSTALL_OPTION in env file.

1 = VLLM (OpenVINO - CPU)

  • Please also provide HF_TOKEN if using this option. Refer here to create a token.
  • Ensure the hugging face token has access to Mistral 7b instruct v0.3 model. Refer here to get access to model.


cp .env.template .env

2. Build docker container

docker compose build

3. Start docker container

docker compose up -d


Utilize NPU in AI PC

The Speech to Text model inference can be offloaded on the NPU device on an AI PC. Edit the ENCODER_DEVICE to NPU in backend/config.yaml to run the encoder model on NPU. Currently only encoder model is supported to run on NPU device

# Example:
  MODEL_ID: base
  ENCODER_DEVICE: NPU # <- Edit this line to NPU

Uninstall the app


Environmental variables

You can change the port of the backend server api to route to specific OpenAI compatible server running as well as the serving port.

Environmental variable Default Value
OPENAI_BASE_URL http://localhost:8012/v1


  1. Current speech-to-text feature only work with localhost.
  2. RAG documents will use all the documents that are uploaded.


  1. If you have error to run the applications, you can refer to the log files in the logs folder.