This tool analyzes input text and suggests improvements based on semantic similarity to a list of standard phrases. It provides both a command-line interface (CLI) and a simple web-based user interface (UI).
- Spacy: Utilized for natural language processing (NLP) tasks, including tokenization and syntactic analysis.
- Sentence Transformers: Employed for generating contextualized embeddings of phrases, enabling semantic similarity calculations.
- Flask: Integrated into a lightweight web application for user-friendly interaction through a web interface.
Spacy was chosen for its efficiency in tokenization and syntactic analysis. Its pre-trained English model, "en_core_web_md", provides accurate tokenization and dependency parsing, contributing to the semantic understanding required for the Text Improvement Engine. The version of the model can be replaced with "en_core_web_lg" for slightly better accuracy or with "en_core_web_sm" for slightly better performance.
The Sentence Transformers library was selected to encode input phrases and standard terms into embeddings. This choice was motivated by its ability to capture contextual information, enhancing the semantic similarity calculations crucial for suggesting suitable replacements.
Flask, a micro web framework, was chosen for its simplicity and ease of integration. The lightweight nature of Flask allows for quick development and deployment of a user interface.
The threshold for similarity (set at 0.45) was chosen empirically to balance precision and recall. Suggestions are ranked based on similarity scores, allowing users to focus on the most relevant replacements first.
The CLI provides a straightforward way to assess the Text Improvement Engine quickly. Users can run the CLI script, input file names, and receive instant suggestions and similarity scores.
/text_improvement_engine
|-- output
| |-- suggestions.json
|-- src
| |-- input_text.txt
| |-- standard_terms.csv
|-- templates
| |-- index.html
|-- cli.py
|-- README.md
|-- requirements.txt
|-- suggestions.py
|-- ui.py
|-- utils.py
- Clone this repository:
git clone https://github.com/imasloff/text-improvement-engine.git
- Install dependencies:
pip install -r requirements.txt
Run the CLI script:
python cli.py
Follow the instructions to input the standard terms and the file containing the input text or skip them to use the default files.
- Run the UI script:
python ui.py
- Open your web browser and go to http://127.0.0.1:5000/.
- Input the text in the provided textarea and upload the standard terms file (in TXT or CSV format), you can skip file upload to use default standard terms.
The results are saved in the output directory. For CLI, it prints suggestions on the console. For the UI, it both generates a JSON file (suggestions.json) containing replacement suggestions with their similarity scores and prints them in UI form.