Run the following:
python compression/run_compression.py \
--pretrained-model facebook/opt-125m \
--quantized-model-dir quantized_opt125m \
--n-samples 128
Run the following:
cd eval/mmlu
./eval_on_mmlu.sh ../../quantized_opt125m facebook/opt-125m /net/nfs.cirrascale/allennlp/akshitab/data/mmlu eval_results
Output format:
Average accuracy 0.202 - math
Average accuracy 0.232 - health
Average accuracy 0.219 - physics
Average accuracy 0.270 - business
Average accuracy 0.198 - biology
Average accuracy 0.172 - chemistry
Average accuracy 0.267 - computer science
Average accuracy 0.204 - economics
Average accuracy 0.234 - engineering
Average accuracy 0.238 - philosophy
Average accuracy 0.236 - other
Average accuracy 0.233 - history
Average accuracy 0.177 - geography
Average accuracy 0.204 - politics
Average accuracy 0.225 - psychology
Average accuracy 0.250 - culture
Average accuracy 0.250 - law
Average accuracy 0.212 - STEM
Average accuracy 0.241 - humanities
Average accuracy 0.215 - social sciences
Average accuracy 0.238 - other (business, health, misc.)
Average accuracy: 0.229
Run the following:
cd efficiency
./run_efficiency_benchmark.sh facebook/opt-125m quantized_opt125m
Output format:
Time Elapsed: 500.91 s
Max GPU memory usage: 2.09 GiB.
Average GPU power: 9.00e+01 W.
Average power: 2.04e+02 W.
Total energy: 7.49e-02 kWh.
CO2 emission: 6.35e-03 kg.
Throughput: 0.20 instances / s.
Throughput: 47.30 words / s.
Latency: 5009.10 ms / batch.