- CPU: 13th generations of Intel Core processors and above
- GPU: Intel® Arc™ graphics
- RAM: 32GB
- DISK: 128GB
Please ensure that you have these ports available before running the applications.
Apps | Port |
---|---|
UI | 8010 |
Backend | 8011 |
Serving | 8012 |
If you are using this bundle without any finetuned model, you must follow the steps below before running the setup.
Install the latest Ubuntu* 22.04 LTS Desktop. Refer to Ubuntu Desktop installation tutorial if needed.
2. Create a Hugging Face account and generate an access token. For more information, please refer to link.
3. Login to your Hugging Face account and browse to mistralai/Mistral-7B-Instruct-v0.3 and click on the Agree and access repository
button.
This step will clone the repository.
sudo apt install git
git clone https://github.com/intel/edge-developer-kit-reference-scripts
This step will redirect user to the current platform setup directory.
cd edge-developer-kit-reference-scripts/usecases/llm/rag-toolkit
This step will download all the dependencies needed to run the application.
./install.sh
Run the script to start all the services. During the first time running, the script will download some assets required to run the services, please ensure you have internet connection.
./run.sh
- Docker and docker compose should be setup before running the commands below. Refer to here to setup docker.
- Install necessary GPU drivers.
- Refer to here to setup GPU drivers
Set the INSTALL_OPTION in env file.
1 = VLLM (OpenVINO - CPU)
- Please also provide HF_TOKEN if using this option. Refer here to create a token.
- Ensure the hugging face token has access to Mistral 7b instruct v0.3 model. Refer here to get access to model.
2 [default] = OLLAMA (SYCL LLAMA.CPP - CPU/GPU)
cp .env.template .env
docker compose build
docker compose up -d
The Speech to Text model inference can be offloaded on the NPU device on an AI PC. Edit the ENCODER_DEVICE
to NPU in backend/config.yaml
to run the encoder model on NPU. Currently only encoder model is supported to run on NPU device
# Example:
STT:
MODEL_ID: base
ENCODER_DEVICE: NPU # <- Edit this line to NPU
DECODER_DEVICE: CPU
./uninstall.sh
You can change the port of the backend server api to route to specific OpenAI compatible server running as well as the serving port.
Environmental variable | Default Value |
---|---|
OPENAI_BASE_URL | http://localhost:8012/v1 |
SERVER_HOST | 0.0.0.0 |
SERVER_PORT | 8011 |
- Current speech-to-text feature only work with localhost.
- RAG documents will use all the documents that are uploaded.
- If you have error to run the applications, you can refer to the log files in the logs folder.