Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks

About

This repository contains the evaluation scripts and results as described in the manuscript Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks submitted to ICPM 2024.

Data

The corpus and datasets can be downloaded from here: datasets

ICL task descriptions (taken from llm/prompts.py)

T-SAD: Given a set of activities that constitute an organizational process and a sequence of activities, determine whether the sequence is a valid execution of the process. The activities in the sequence must be performed in the correct order for the execution to be valid. Provide either True or False as the answer and nothing else.
A-SAD: You are given a set of activities that constitute an organizational process and two activities performed in a single process execution. Determine whether it is valid for the first activity to occur before the second. Provide either True or False as the answer and nothing else.Y
S-NAP: You are given a list of activities that constitute an organizational process and a sequence of activities that have been performed in the given order. Which activity from the list should be performed next in the sequence? The answer should be one activity from the list and nothing else.

Results

The results of the experiments can be found in the 'eval/' folder. They are stored in a .csv file

Running the Experiments

Hardware Requirements

Nvidia GPU with at least 42GB of memory is required to run the experiments (e.g., RTX A6000).

Setup

via pip

First, set up a virtual env (conda is useful for this):

pip install -r requirements.txt

In-Context Learning

Create a folder called 'data/' one level above the project root folder.
Place the downloaded dataset into the 'data/' folder.
Place the 'train_val_test.pkl' file in the 'data/' folder.
To run the ICL experiments, execute the following CLI in the project root folder (set the parameters of your choice, the config of the paper corresponds to the examples listed in the parameter descriptions below):

python evaluate_llm.py --task --device --hf_model --rand_shots --runs --num_samples

Parameters

--task: one of "out_of_order", "trace_anomaly", "next_activity"

--device: e.g., "cuda:0" for the first GPU on your machine

--hf_model: the Huggingface name of the model, e.g., "meta-llama/Meta-Llama-3-8B-Instruct", "mistralai-Mistral-7B-Instruct-v0.2",

--rand_shots: a list with the numbers of shots to include, e.g., "[3,5]" for 6 and 10 shots (twice the amount is taken. In case of binary tasks one positive and one negative example from the same process)

--runs: the number of runs to execute

--num_samples: the number of samples to draw from the test set, e.g., 20000

Fine-tuning

The fine-tuning experiments are run using the Trident framework. See this repository for our fine-funing evaluation scripts and how to use them: https://github.com/fdschmidt93/trident-bpm

The sub-folder 'bpm' contains the necessary preprocessing and evaluation code.
The individual tasks can be run using bash scripts:
- 'pair.sh' for A-SAD using LLMs
- 'trace.sh' for T-SAD using LLMs
- 'activity.sh' for S-NAP using LLMs
- 'trace_activity.sh' for multi-task T-SAD and S-NAP using LLMs
- 'pair_roberta.sh' for A-SAD using RoBERTa
- 'trace_roberta.sh' for T-SAD using RoBERTa
- 'activity_roberta.sh' for S-NAP using RoBERTa
- 'trace_activity_roberta.sh' for multi-task T-SAD and S-NAP using RoBERTa

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
eval		eval
llm		llm
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
const.py		const.py
createcorpus.py		createcorpus.py
createevaldata.py		createevaldata.py
createprefixdata.py		createprefixdata.py
data_handling.py		data_handling.py
evaluate.py		evaluate.py
evaluate_llm.py		evaluate_llm.py
random_baseline.py		random_baseline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks

About

Data

ICL task descriptions (taken from llm/prompts.py)

Results

Running the Experiments

Hardware Requirements

Setup

via pip

In-Context Learning

Parameters

Fine-tuning

About

Releases

Packages

Languages

License

luciendgolden/llms4pm

Folders and files

Latest commit

History

Repository files navigation

Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks

About

Data

ICL task descriptions (taken from llm/prompts.py)

Results

Running the Experiments

Hardware Requirements

Setup

via pip

In-Context Learning

Parameters

Fine-tuning

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages