Native-LLM-for-Android

Demonstration of running a native LLM on Android device. Now support:
- Qwen1.5-Chat: 0.5B, 1.8B ...
- MiniCPM-DPO/SFT: 1B, 2.7B
- Yuan2.0-Februa-hf: 2B+
- Octopus-V2: 2.5B
- Gemma1.1-it: 2.5B
- StableLM2-Chat/Zephyr: 1.6B, 3B
- Phi2: 2.7B
The demo models were uploaded to the drive: https://drive.google.com/drive/folders/1E43ApPcOq3I2xvb9b7aOxazTcR3hn5zK?usp=drive_link
After downloading, place the model into the assets folder.
Remember to decompress the *.so zip file stored in the libs/arm64-v8a folder.
The demo models were converted from HuggingFace or ModelScope and underwent code optimizations to achieve extreme execution speed.
Therefore, the inputs & outputs of the demo models are slightly different from the original one.
To better adapt to ONNX Runtime on Android, the export did not use dynamic axes. Therefore, the exported ONNX model may not be optimal for x86_64 systems.
The tokenizer.cpp and tokenizer.hpp files originated from the mnn-llm repository.
To export the model on your own, please go to the 'Export' folder, follow the comments to set the folder path, and then execute the ***_Export.py Python script. Next, quantize / optimize the onnx model by yourself.
During the export process of MiniCPM-V, the Resampler always reports an error 'aten::_upsample_bilinear2d_aa' operator not supported, therefore, it is temporarily infeasible to use vision interaction.
See more projects: https://dakeqq.github.io/overview/

安卓本地运行LLM

在Android设备上运行本地LLM的演示。目前支持:
- 通义千问1.5-Chat: 0.5B, 1.8B ...
- MiniCPM-DPO/SFT: 1B, 2.7B
- 源2.0-Februa-hf: 2B+
- Octopus-V2: 2.5B
- Gemma1.1-it: 2.5B
- StableLM2-Chat/Zephyr: 1.6B, 3B
- Phi2: 2.7B
演示模型已上传至云端硬盘：https://drive.google.com/drive/folders/1E43ApPcOq3I2xvb9b7aOxazTcR3hn5zK?usp=drive_link
百度: https://pan.baidu.com/s/1NHbUyjZ_VC-o62G13KCrSA?pwd=dake 提取码: dake
下载后，请将模型文件放入assets文件夹。
记得解压存放在libs/arm64-v8a文件夹中的*.so压缩文件。
演示模型是从HuggingFace或ModelScope转换来的，并经过代码优化，以实现极致执行速度。
因此，演示模型的输入输出与原始模型略有不同。
为了更好的适配ONNXRuntime-Android，导出时未使用dynamic-axes. 因此导出的ONNX模型对x86_64而言不一定是最优解.
tokenizer.cpp和tokenizer.hpp文件源自mnn-llm仓库。
想自行导出模型请前往“Export”文件夹，按照注释操作设定文件夹路径，然后执行 ***_Export.py的python脚本。下一步，自己动手量化或优化导出的ONNX模型。
在导出MiniCPM-V的过程中, Resampler总报错“aten::_upsample_bilinear2d_aa”算子不支持，因此暂时无法使用多模态交互。
看更多項目: https://dakeqq.github.io/overview/

通义千问 Qwen - 性能 Performance

OS	Device	Backend	Model	Inference ( 1024 Context )
Android 13	Nubia Z50	8_Gen2-CPU (X2+A715)	Qwen1.5-1.8B q8f32	14 token/s
Harmony 4	P40	Kirin_990_5G-CPU (2*A76)	Qwen1.5-1.8B q8f32	9 token/s
Harmony 3	荣耀20S	Kirin_810-CPU (2*A76)	Qwen1.5-1.8B q8f32	4.5 token/s

MiniCPM - 性能 Performance

OS	Device	Backend	Model	Inference ( 1024 Context )
Android 13	Nubia Z50	8_Gen2-CPU (X2+A715)	MiniCPM-2.7B q8f32	7.7 token/s
Harmony 4	P40	Kirin_990_5G-CPU (2*A76)	MiniCPM-2.7B q8f32	4.5 token/s

源 Yuan - 性能 Performance

OS	Device	Backend	Model	Inference ( 1024 Context )
Android 13	Nubia Z50	8_Gen2-CPU (X2+A715)	Yuan2.0-2B-Februa-hf q8f32	10 token/s
Harmony 4	P40	Kirin_990_5G-CPU (2*A76)	Yuan2.0-2B-Februa-hf q8f32	5.7 token/s

Octopus - 性能 Performance

OS	Device	Backend	Model	Inference ( 1024 Context )
Android 13	Nubia Z50	8_Gen2-CPU (X2+A715)	OctopusV2-2B q8f32	13 token/s

Gemma - 性能 Performance

OS	Device	Backend	Model	Inference ( 1024 Context )
Android 13	Nubia Z50	8_Gen2-CPU (X2+A715)	Gemma1.1-it-2B q8f32	13 token/s

StableLM - 性能 Performance

OS	Device	Backend	Model	Inference ( 1024 Context )
Android 13	Nubia Z50	8_Gen2-CPU (X2+A715)	StableLM2-1.6B-Chat q8f32	14.9 token/s
Harmony 4	P40	Kirin_990_5G-CPU (2*A76)	StableLM2-1.6B-Chat q8f32	9.2 token/s
Harmony 3	荣耀20S	Kirin_810-CPU (2*A76)	StableLM2-1.6B-Chat q8f32	4.6 token/s

Phi - 性能 Performance

OS	Device	Backend	Model	Inference ( 1024 Context )
Android 13	Nubia Z50	8_Gen2-CPU (X2+A715)	Phi2-2B-Orange-V2 q8f32	8 token/s
Harmony 4	P40	Kirin_990_5G-CPU (2*A76)	Phi2-2B-Orange-V2 q8f32	4.9 token/s

演示结果 Demo Results

(Qwen1.5-1.8B / 1024 Context)

Name		Name	Last commit message	Last commit date
Latest commit History 325 Commits
Export		Export
Gemma_ONNX		Gemma_ONNX
MiniCPM_ONNX		MiniCPM_ONNX
Octopus_ONNX		Octopus_ONNX
Phi_ONNX		Phi_ONNX
Qwen_ONNX		Qwen_ONNX
StableLM_ONNX		StableLM_ONNX
Yuan_ONNX		Yuan_ONNX
LICENSE		LICENSE
LLM_Qwen.gif		LLM_Qwen.gif
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Native-LLM-for-Android

安卓本地运行LLM

通义千问 Qwen - 性能 Performance

MiniCPM - 性能 Performance

源 Yuan - 性能 Performance

Octopus - 性能 Performance

Gemma - 性能 Performance

StableLM - 性能 Performance

Phi - 性能 Performance

演示结果 Demo Results

About

Releases

Packages

Languages

License

byteDevDesign/Native-LLM-for-Android

Folders and files

Latest commit

History

Repository files navigation

Native-LLM-for-Android

安卓本地运行LLM

通义千问 Qwen - 性能 Performance

MiniCPM - 性能 Performance

源 Yuan - 性能 Performance

Octopus - 性能 Performance

Gemma - 性能 Performance

StableLM - 性能 Performance

Phi - 性能 Performance

演示结果 Demo Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages