Added Support for Apple Silicon #1289

shashikanth-a · 2024-11-14T08:39:11Z

Unoptimized
No gguf support yet.
Build Triton and bitsandbytes from source
cmake -DCOMPUTE_BACKEND=mps -S . for bitsandbytes building
pip install unsloth-zoo==2024.11.4
pip install xformers==0.0.25

- No gguf support yet. - Build Triton and bitsandbytes from source - `cmake -DCOMPUTE_BACKEND=hip -S .` for bitsandbytes building

yukiarimo · 2024-11-18T03:58:33Z

Is this working?

shimmyshimmer · 2024-11-20T01:48:53Z

Hi there thank you for this we will need a bit more time to review! :)

mkemka · 2024-11-21T17:32:43Z

Hi @shashikanth-a - thank you for this. Could you please provide information about the environment and package versions you used for development?

yukiarimo · 2024-11-21T18:58:56Z

Hey, does this works with newly released vision support?

mkemka · 2024-11-23T06:10:11Z

Currently I can run this if:

Decorators mentioning "@torch.compile(fullgraph = False, dynamic = True, options = torch_compile_options)" are removed in llama and Gemma files.
Fine tune llama-3-8b (3.2 1b and 3b throw an error due to rope for some reason.

- lazy loading of model - minor refactoring - optimizers and lr schedulers - gc - should improve memory consumption

mkemka · 2024-11-26T05:52:33Z

With the changes I can run this out of the box with the steps outlined above:

Build Triton from source and pip install -e .
Build bnb with cmake -DCOMPUTE_BACKEND=mps -S . and pip install -e .

On a M4 Pro getting around 100 t/s for llama3-8b. Can confirm it will also now work with llama-3.2-3b

shimmyshimmer · 2024-12-10T02:56:03Z

Thanks a lot - would anyone be so kind to benchmark this against MLX itself and share results?

Time it took, amount of VRAM, context length, if the losses match - ofcourse it's a lot so just time and checking to see if the losses match would be more than helpful. Thank you so much! :)

Added Support for Apple Silicon

c0980fd

- No gguf support yet. - Build Triton and bitsandbytes from source - `cmake -DCOMPUTE_BACKEND=hip -S .` for bitsandbytes building

minor fixes and enhancements

066c227

- lazy loading of model - minor refactoring - optimizers and lr schedulers - gc - should improve memory consumption

shashikanth-a added 3 commits November 28, 2024 10:14

4 bit quantized models added

df72331

Merge branch 'main' into apple_silicon_support

38e3cfc

merge fixes

a246add

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Support for Apple Silicon #1289

Added Support for Apple Silicon #1289

shashikanth-a commented Nov 14, 2024 •

edited

Loading

yukiarimo commented Nov 18, 2024

shimmyshimmer commented Nov 20, 2024

mkemka commented Nov 21, 2024

yukiarimo commented Nov 21, 2024

mkemka commented Nov 23, 2024

mkemka commented Nov 26, 2024

shimmyshimmer commented Dec 10, 2024

Added Support for Apple Silicon #1289

Are you sure you want to change the base?

Added Support for Apple Silicon #1289

Conversation

shashikanth-a commented Nov 14, 2024 • edited Loading

yukiarimo commented Nov 18, 2024

shimmyshimmer commented Nov 20, 2024

mkemka commented Nov 21, 2024

yukiarimo commented Nov 21, 2024

mkemka commented Nov 23, 2024

mkemka commented Nov 26, 2024

shimmyshimmer commented Dec 10, 2024

shashikanth-a commented Nov 14, 2024 •

edited

Loading