Project elaborated by Ramon Grande Da Luz Bouças as a parcial requirement for obtaining a Bachelor of Science degree in computer science at CEFET/RJ
This repository contains TextGrader. In essence textgrader contains the various versions of a Essay and short answer evaluation system.
The system has 4 parts:
- Preprocessing where we correct spelling change columns schema and do other minor preprocessing steps
- Feature engineering, where we generate some basic features like word count and sentence count, and generate datasets embedding words with each one of the following 4 techniques: TF-IDF, WORD-2-VEC, USE, LSI.
- Model training, where we train some instances of a random forest model using one of the following 3 approaches: Regression, Classification and Ordinal Classification.
- Model Evaluation, where we use the trained models to generate predictions and evaluate those predictions.
Our code uses python 3.9 and some libraries like scikit-learn and NLTK. We recommend using anaconda or miniconda. If you are using one of these, you can easily setup your environment in order to run our code using the following code
conda create -n [choose name] python=3.9
pip install requirements.txt
The training datasets used for both essay evaluation task and short answer evaluation task are avaliable at
The dataset essay.xlsx belongs in the datalake/essay/raw folder
and the dataset short_answer.xlsx belongs in the datalake/short_answer/raw folder
However, to run the system there is no need to download and place the datasets at the aforementioned folders, because we already submitted the project with
the proper files on proper folders.
Being in the main directory, do:
Before running the tasks for evaluating essays or short answers, it is necessary to run the spell corrector for both texts
- python run.py task general_tasks task_correct_essays
- python run.py task general_tasks task_correct_short_answers
To run the pipeline for essays run: python run.py task general_tasks task_pipeline_essays
To run the pipeline for short answer run: python run.py task general_tasks task_pipeline_short_answer
To give your opinion about this work, send an email to [email protected]