Here we provide two AWQ examples, applying to:
- Vicuna-7B, a chatbot with instruction-tuning
- LLaVA-13B, a visual LM for multi-modal applications like visual reasoning.
- A simple conversion script to convert llm-awq weights into HF format.
Here are some example output from the two demos. You should able to observe memory saving when running the demos in 4-bit. Please check the notebooks for details.