Please refer to this link for installation and evaluation. If you utilize this evaluation, please remember to cite the original repository.
To evaluate on gsm8k, please run
bash zero_eval_mamba.sh MambaInLlama_0_50 gsm
To evalute on CRUX, please run
bash zero_eval_mamba.sh MambaInLlama_0_50 crux
- GSM:
python src/evaluation/gsm_eval.py
--> Full results - CRUX:
python src/evaluation/crux_eval.py
--> Full results