conda create -n fveval python=3.10
conda activate fveval
pip install -r requirements.txt
pip install -e .
Running the full evaluation scripts and funcationalities implemented requires access to a commercial formal verification tool (Cadence Jasper). The code assumes the PATH variable is updated such that the Jasper binary is accessible with the command 'jg'
# check Cadence Jasper is accessible globally
jg -no_gui
# Run the following commands in your conda environment created above
# (1) run full flow for NL2SVA-Human
bash scripts/run_nl2sva_human.sh {k: number of ICL examples to use}
# (2) run full flow for NL2SVA-Machine
bash scripts/run_nl2sva_machine.sh {k: number of ICL examples to use}
# (3) run full flow for Design2SVA
bash scripts/run_design2sva.sh {n: number of outputs to sample, with which pass@k k<=n is evaluated}
# Run the following commands in your conda environment created above
# You can supply a list of models to test with the --models flag, with model names ;-separated
# Run with --debug to print all input and outputs to and from the LM
# Change LLM decoding temperature with the --temperature flag
# You can also see the flag options available for each run script by passing the '-h' flag
# Running LM inference on the NL2SVA-machine (assertion generation from directed NL instructions) benchmark:
python run_nl2sva.py --mode machine --models "gpt-4;gpt-3.5-turbo" --num_icl {k: number of ICL examples to use}
# Running LM inference on the NL2SVA-Human (assertion generation from testbench and high-level instructions) benchmark:
python run_nl2sva.py --mode human --models "gpt-4;gpt-3.5-turbo" --num_icl {k: number of ICL examples to use}
# Running LM inference on the Design2SVA (SV testbench generation) benchmark:
python run_design2sva.py --models "mixtral-8x22b"
# Run the following commands in your conda environment created above
# You can supply a list of models to test with the --models flag, with model names ;-separated
# Run with --debug to print all input and outputs to and from the LM
# Change LLM decoding temperature with the --temperature flag
# You can also see the flag options available for each run script by passing the '-h' flag
# Running LM inference on the NL2SVA-machine (assertion generation from directed NL instructions) task:
python run_nl2sva.py --mode machine --models "gpt-4;gpt-3.5-turbo" --num_icl 3
# Running LM inference on the NL2SVA-Human (assertion generation from testbench and high-level instructions) task:
python run_nl2sva.py --mode human --models "gpt-4;gpt-3.5-turbo" --num_icl 3
# Running LM inference on the Design2SVA (SV testbench generation) task:
python run_design2sva.py --models "mixtral-8x22b"
Overview of the repository:
fv_eval/
├── fv_eval/
│ ├── benchmark_launcher.py (methods for consuming input bmark data and run LM inference)
│ ├── evaluation.py (methods for LM response evaluation)
│ ├── fv_tool_execution.py (methods for launching FV tools, i.e. Cadence Jasper)
│ ├── data.py (definitions for input/output data)
│ ├── prompts_*.py (default prompts for each subtask)
│ ├── utils.py (misc. util functions)
|
├── data_design2sva/
| |── data/
│ |── generate_pipelines_design2sva.py
│ |── generate_fsm_design2sva.py
|
├── data_nl2sva/
│ |── annotated_instructions_with_signals/
│ |── annotated_tb/
│ |── data/
│ |── machine_tb/
│ |── generate_nl2sva_human.py
│ |── generate_nl2sva_machine.py
|
├── tool_scripts/
| ├── pec/ (property equivalence check script)
| | |── pec.tcle
│ ├── run_jg_design2sva.tcl
│ ├── run_jg_nl2sva_human.tcl
│ ├── run_jg_nl2sva_machine.tcl
|
├── run_evaluation.py
├── run_svagen_design2sva.py
├── run_svagen_nl2sva.py
|
├── setup.py
└── README.md
Copyright © 2024, NVIDIA Corporation. All rights reserved. This work is made available under the Apache 2.0 License.