Clinical Trials Search Engine for Information Retrieval and Recommender System Project
This project has been developed for academical purposes only.
This project is released under the GNU General Public Licence.
Clone the repository locally:
- Git :
git clone https://github.com/MikiTwenty/ClinicalTrialsSE
- GitHub :
gh repo clone MikiTwenty/ClinicalTrialsSE
- Virtual Environment with Python >= 3.10
- Required Python packeges (see requirements.txt).
- Java SDK
Get files from Hugging Face:
git lfs install
git clone https://huggingface.co/medicalai/ClinicalBERT
You can find the models on Hugging Face. Suggested models are models with 7B parameters such as:
- PyTorch : see documentation
- llama-cpp-python : see documentation
Initialize the Graphical User Interface to run the SE.
Initialize the Graphical User Interface (powered by NiceGUI).
Parameters
-
paths
(Dict[str, str])
: The paths dictionary.The paths keys must be:
'DATA'
: the folder containing all the data;'DATASET'
: the dataset.pkl file;'DOCUMENTS'
: the folder containing all the dataset files;'INDEXING_FILES'
: the folder for the indexing files;'LLM'
: the folder containg the LLM GGUF files;'BERT'
: the folder containg ClinicalBert model files;'JDK'
: the folder containing Java SDK;'EVAL'
: the folder containing the evaluation files (optional);'LOG'
: the log.txt file (optional). -
verbose
(bool)
: IfTrue
, enables verbose output.
Initialize the SE and start the GUI.
- To use the SE run.py
- To preprocess the data train.py
- Univeristy of Pavia
- Artificial Intelligence BSc
- Information Retrieval and Recommender Systems
- Authors: Michele Ventimiglia (@MikiTwenty), Manuel Dellabona (@manudella)