GitHub - casezhao/openai-embbed-py

Embbeding

Quick, accurate answer
Cheap
Cannot extend beyond provided questions/answer
Need more data to be useful (to answer all the question might be asked by the users)

Vector Index seems most appropriate for semantic search task such as this
HyDE is tried, but it does not improve the current model too much, plus it consumes more time and tokens
Single-step decomposition is potentially useful. It can answer more complex queries by transform the query into subquestion and check against the index. The current model does not have this capability, but will be integrated in the future to cover more use cases.
Default ranking suffers from some hallucination (e.g. How much carb and sugar shoul I eat?) (e.g. Is a diet high in sugar and carb bad for me?). Cohere Rerank Node post-processor is employed to mitigate. Result: Cohere rerank produces even more alien results.
Switching to curie produces significantly faster response, but accuracy suffer. Further engineering required to optimize this process.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
tree		tree
vector_store		vector_store
vector_store_curie		vector_store_curie
vector_store_vinci002		vector_store_vinci002
.gitignore		.gitignore
README.md		README.md
app.py		app.py
celery_app.py		celery_app.py
embbeded_question.csv		embbeded_question.csv
gunicorn_config.py		gunicorn_config.py
prepare_embbeding.ipynb		prepare_embbeding.ipynb
prepare_llamaindex.ipynb		prepare_llamaindex.ipynb
raw_data.csv		raw_data.csv
raw_data_1.json		raw_data_1.json
raw_data_2.json		raw_data_2.json
raw_data_3.json		raw_data_3.json
raw_data_combined.json		raw_data_combined.json
requirements.txt		requirements.txt
wsgi.py		wsgi.py