USTC-GPT

Introduction

We trained a large language model that is exclusive to the USTC campus guide, realize "LLM+USTC", and provide a full range of services for USTC people.

We collected various information from campus websites, forums, official accounts, etc., and regularly finetuned the model during the use of the model by teachers and students. The goal of USTC-GPT is to provide the following search and question answering functions:

(1)Study Guidance : USTC-GPT can provide students with course learning suggestions, such as the school's introduction to course selection, withdrawal, abandonment of grades, change of major, etc., course learning suggestions, etc.;

(2)Living Services: USTC-GPT can provide students with a variety of living services and suggestions, such as the recharge, loss report and replacement of the card, the introduction of the on-campus express site, the on-campus sports venues, the contact information of the school teacher, etc.;

(3)Admissions Publicity: USTC-GPT can provide detailed and accurate admissions information and promotional materials, including admissions policies, admission standards, major introductions, employment prospects, etc., based on the consultation content and concerns of parents and candidates, to demonstrate the characteristics and advantages of USTC.

(4)Access Guide: USTC-GPT can provide professional visit guides and services, including scenic spot visits, canteen recommendations, etc., to introduce the history and current development of USTC.

Data

Initially, we collected raw data from campus websites, forums, official accounts, etc. Main souces are as follows:

(1)Official websites, including the information of faculty, news and notices of departments.

(2)Entrance Guide for freshers, editted by senior students.

(3)Social Media: including Courses' Reviews Platform, Nan Qi Forum, mainly focusing on useful blogs and comments instructive to USTC's campus life.

The raw data is stored with a unique ID, which avoids the confusion caused by repetitive data.

Technical Route

We combined the technique of vector database with efficient parameter fine-tuning, explored and realized the method of retrieval-augmented generation (RAG) to reduce hallucinations of LLMs.

Vector databases are used to expand the knowledge boundaries of large language models, which was implemented on Milvus. The efficient parameter fine-tuning method makes the model more suitable for the actual situation of the school. We generated finetuning datasets and finetuned on LoRA.

Demo

demo.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
data		data
data_create		data_create
data_process		data_process
knowledge_augment		knowledge_augment
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

USTC-GPT

Introduction

Data

Technical Route

Demo

About

Releases

Packages

Contributors 4

Languages

licy02/ustc

Folders and files

Latest commit

History

Repository files navigation

USTC-GPT

Introduction

Data

Technical Route

Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages