Skip to content

zach-zhiling-zheng/ChatGPT_Chemistry_Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChatGPT_Chemistry_Assistant

ChatGPT Chemistry Assistant

Please check out https://pubs.acs.org/doi/10.1021/jacs.3c05819 for more details.

Step-by-step illustrations of setting up Processes 1, 2, and 3 were shown in the Supporting Information file of this article in the cookbook style.

If you find this work helpful to your research, kindly consider citing the following:

Zheng, Z.; Zhang, O.; Borgs, C.; Chayes, J. T.; Yaghi, O. M., ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis. J. Am. Chem. Soc. 2023. (DOI: 10.1021/jacs.3c05819)

Thank you!

Contents

· Text Mining: PDF Text Processing and Analysis with OpenAI's gpt-3.5-turbo API or gpt-4 API

· MOF Chatbot: a chatbot answers question based on post text mining data

· Predictive Model: A RF classfifier trained on post text mining data

Features

This text mining assistant includes the following main functions:

· Extraction of text from PDF files and its division into smaller chunks.

· Classfication of text segments.

· Processing and summarization of the extracted text data.

· Conversion of summarized data into a tabular format.

· Calculation of text embeddings using the OpenAI API.

· Selection of top similarity sections and their neighbors in the data.

· Calculation of text token count using the tiktoken library.

This MOF Synthesis Assistant tool provides the following core functionalities:

· Extraction of synthesis information and embeddings from a CSV file.

· Calculation of similarity scores.

· Sorting of text segments based on their similarity scores.

· Selection of top similar synthesis conditions from the sorted data.

· Processing of multiple user questions to maintain a conversational context.

· Use of the OpenAI API to generate text embeddings for user's questions based on the selected synthesis conditions.

· Maintenance of a conversation history for better contextually accurate responses in a conversational interface.

· A user-friendly conversational interface for asking questions related to MOF synthesis conditions.

This machine learning tool includes the following primary functions:

· Data Preprocessing: Reads, processes, and drops unused data columns from CSV file.

· Feature Selection: Applies RFECV for robust feature selection.

· Data Splitting: Splits data into training and testing sets with various sizes.

· Hyperparameter Tuning: Performs tuning via RandomizedSearchCV for RandomForestClassifier.

· Model Evaluation: Computes several performance metrics for each model configuration.

· Optimal Model Selection: Selects the best performing model based on balanced accuracy.

· Random Splits: Supports multiple random states for data splitting.

· Reporting: Records all performance metrics in an organized format for model comparison.

Dependencies

· This project is built on Python and requires the following libraries:

openai

requests

PyPDF2

pandas

tiktoken

sklearn

numpy

mendeleev

About

ChatGPT Chemistry Assistant

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published