Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Support for Apple Silicon #1289

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

shashikanth-a
Copy link

@shashikanth-a shashikanth-a commented Nov 14, 2024

  • Unoptimized
  • No gguf support yet.
  • Build Triton and bitsandbytes from source
  • cmake -DCOMPUTE_BACKEND=mps -S . for bitsandbytes building
  • pip install unsloth-zoo==2024.11.4
  • pip install xformers==0.0.25

- No gguf support yet.
- Build Triton and bitsandbytes from source
- `cmake -DCOMPUTE_BACKEND=hip -S .` for bitsandbytes building
@yukiarimo
Copy link

Is this working?

@shimmyshimmer
Copy link
Collaborator

Hi there thank you for this we will need a bit more time to review! :)

@mkemka
Copy link

mkemka commented Nov 21, 2024

Hi @shashikanth-a - thank you for this. Could you please provide information about the environment and package versions you used for development?

@yukiarimo
Copy link

Hey, does this works with newly released vision support?

@mkemka
Copy link

mkemka commented Nov 23, 2024

Currently I can run this if:

  • Decorators mentioning "@torch.compile(fullgraph = False, dynamic = True, options = torch_compile_options)" are removed in llama and Gemma files.
  • Fine tune llama-3-8b (3.2 1b and 3b throw an error due to rope for some reason.

- lazy loading of model
- minor refactoring
- optimizers and lr schedulers
- gc
- should improve memory consumption
@mkemka
Copy link

mkemka commented Nov 26, 2024

With the changes I can run this out of the box with the steps outlined above:

  • Build Triton from source and pip install -e .
  • Build bnb with cmake -DCOMPUTE_BACKEND=mps -S . and pip install -e .

On a M4 Pro getting around 100 t/s for llama3-8b. Can confirm it will also now work with llama-3.2-3b

@shimmyshimmer
Copy link
Collaborator

Thanks a lot - would anyone be so kind to benchmark this against MLX itself and share results?

Time it took, amount of VRAM, context length, if the losses match - ofcourse it's a lot so just time and checking to see if the losses match would be more than helpful. Thank you so much! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants