- Download the git repo to a local directory
- use a terminal to go to this directory
- create a python enviornment by executing "python3 -m venv py_env"
- activate the enviornment by executing "source py_env/bin/activate"
- Install the requirements by executing "pip -r requirements.txt"
- Create a copy of config.json and edit this file to include the domain that you want to parse
- STEP 1: execute crawler by running 'python crawlScrapy.py -c='
- After this is done, confirm that a new directory "text<new domain>" was created and contains text files
- Update enviornment variable to include the openai key. "export OPENAI_API_KEY="
- STEP 2: Run "python createEmbeddings.py -c=". This will create embeddings.csv and scraped.csv files under "processed"
- STEP 3: Assuming no errors, run "python answer.py -c="
- Hit enter on a new line to exit
If needed:
pip install matplotlib
pip install plotly
pip install scipy
pip install -U scikit-learn