Name		Name	Last commit message	Last commit date
parent directory ..
compression		compression
efficiency		efficiency
eval		eval
NOTES.md		NOTES.md
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

README.md

LLM Inference

Compress

Run the following:

python compression/run_compression.py \
    --pretrained-model facebook/opt-125m \
    --quantized-model-dir quantized_opt125m \
    --n-samples 128

Run accuracy benchmark

Run the following:

cd eval/mmlu
./eval_on_mmlu.sh ../../quantized_opt125m facebook/opt-125m /net/nfs.cirrascale/allennlp/akshitab/data/mmlu eval_results

Output format:

Average accuracy 0.202 - math
Average accuracy 0.232 - health
Average accuracy 0.219 - physics
Average accuracy 0.270 - business
Average accuracy 0.198 - biology
Average accuracy 0.172 - chemistry
Average accuracy 0.267 - computer science
Average accuracy 0.204 - economics
Average accuracy 0.234 - engineering
Average accuracy 0.238 - philosophy
Average accuracy 0.236 - other
Average accuracy 0.233 - history
Average accuracy 0.177 - geography
Average accuracy 0.204 - politics
Average accuracy 0.225 - psychology
Average accuracy 0.250 - culture
Average accuracy 0.250 - law
Average accuracy 0.212 - STEM
Average accuracy 0.241 - humanities
Average accuracy 0.215 - social sciences
Average accuracy 0.238 - other (business, health, misc.)
Average accuracy: 0.229

Run efficiency benchmark

Run the following:

cd efficiency
./run_efficiency_benchmark.sh facebook/opt-125m quantized_opt125m

Output format:

Time Elapsed: 500.91 s
Max GPU memory usage:  2.09 GiB.
Average GPU power:  9.00e+01 W.
Average power:  2.04e+02 W.
Total energy:  7.49e-02 kWh.
CO2 emission:  6.35e-03 kg.
Throughput:  0.20 instances / s.
Throughput:  47.30 words / s.
Latency:  5009.10 ms / batch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference

inference

README.md

LLM Inference

Compress

Run accuracy benchmark

Run efficiency benchmark

Files

inference

Directory actions

More options

Directory actions

More options

Latest commit

History

inference

Folders and files

parent directory

README.md

LLM Inference

Compress

Run accuracy benchmark

Run efficiency benchmark