Skip to content

Latest commit

 

History

History
23 lines (14 loc) · 828 Bytes

mlx_integration.md

File metadata and controls

23 lines (14 loc) · 828 Bytes

Apple MLX Integration

You can use Apple MLX as an optimized worker implementation in FastChat.

It runs models efficiently on Apple Silicon

See the supported models here.

Note that for Apple Silicon Macs with less memory, smaller models (or quantized models) are recommended.

Instructions

  1. Install MLX.

    pip install "mlx-lm>=0.0.6"
    
  2. When you launch a model worker, replace the normal worker (fastchat.serve.model_worker) with the MLX worker (fastchat.serve.mlx_worker). Remember to launch a model worker after you have launched the controller (instructions)

    python3 -m fastchat.serve.mlx_worker --model-path TinyLlama/TinyLlama-1.1B-Chat-v1.0