LLMSCAN

LLMSCAN is a tool designed to parse and analyze source code to instantiate LLM-based program analysis. Based on Tree-sitter, it provides functionality to identify and extract functions from the source code, along with their metadata such as function name, line numbers, parameters, call sites, and other program constructs (including branches and loops). Importantly, it achieves light-weighted call graph analysis based on parsing, which enables more effective code browsing and navigation for real-world programs. The latest version of LLMSCAN can support five programming languages, including C, C++, Java, Python, and Go.

Attention: Considering the language syntax differences, we give up supporting multiple languages in main branch. Since 2025/03/01, the active development branches have been cpp, java, python, and go.

Features

Parse source code using Tree-sitter.
Browse code for prompting-based static analysis.
Multi-linguistic support.

Functionality

MetaScan: Extract syntactic facts as function metadata.

You can define your own scanners in the directory src/pipeline.

Installation

Clone the repository:

git clone [email protected]:PurCL/LLMSCAN.git
cd LLMSCAN

Install the required dependencies:
```
pip install -r requirements.txt
```
Ensure you have the Tree-sitter library and language bindings installed:
```
cd lib
python build.py
```

Configure the keys:

export OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxkey1:sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxkey2:sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxkey3:sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxkey4 >> ~/.bashrc

We suggest including multiple keys to facilitate parallel analysis with high throughput.

Similarly, the other two keys can be set as follows:

export REPLICATE_API_TOKEN=xxxxxx >> ~/.bashrc
export GEMINI_KEY=xxxxxx >> ~/.bashrc

Quick Start

Prepare the project that you want to analyze. Here we use the Linux kernel as an example:
```
cd benchmark
mkdir C && cd C
git clone [email protected]:torvalds/linux.git
```
You can also use our provided benchmark programs to run a demo.
Run the analysis to extract the meta data of each function:
```
cd src
./run.sh
```

The output files are dumped in the directory log.

How to Extend

More Program Facts

You can implement your own analysis by adding more modules, such as more parsing-based primitives (in parser/program_parser). If you want to derive semantic facts, which may be beyond the capability of parsing-based analysis, you can customize the prompts and leverage LLMs to derive them in a neural manner.

More Programming Languages

The framework is language-agnostic. To migrate the current implementations to other programming languages or extract more syntactic facts, please refer to the grammar files in the corresponding Tree-sitter libraries and refactor the code in parser/program_parser.py. Basically, you only need to change the node types when invoking find_nodes_by_type.

Here are the links to grammar files in Tree-sitter libraries targeting mainstream programming languages:

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

License

This project is licensed under MIT license.

Contact

For any questions or suggestions, please contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.vscode		.vscode
benchmark		benchmark
lib		lib
log		log
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMSCAN

Features

Functionality

Installation

Quick Start

How to Extend

More Program Facts

More Programming Languages

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

PurCL/LLMSCAN

Folders and files

Latest commit

History

Repository files navigation

LLMSCAN

Features

Functionality

Installation

Quick Start

How to Extend

More Program Facts

More Programming Languages

Contributing

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages