Homecraft Retail demo with Elastic ESRE and Google's GenAI

This repo shows how to leverage Elastic search capabilities (both text and vector ones) togheter with Google Cloud's GenerativeAI models and VertexAI features to create a new retail experience. With this repo you will:

Create a python streamlit app with an intelligent search bar
Integrate with Palm2 models and VertexAI APIs
Configure an Elastic cluster as a private data source to build context for LLMs
Ingest data from multiple data sources (Web Crawler, files, BigQuery)
Use Elastic's text_embeddings and vector search for finding relevant content
Fine-tune the text-bison@001 foundation model via VertexAI for specific tasks handling
and more...

Configuration steps

!!! NEW !!! Now available a detailed step-by-step walkthrough to implement this repo here (also usable for external workshops)

Setup your Elastic cluster with ML nodes
Install python on your local machine. If using Homebew on macOS simply use

brew install [email protected]

(Optional) For better python environment management use Virtual Envs. Create a folder for your project in your favourite location, enter it and create a venv named "homecraftenv". You will then install all the libraries required only inside this venv instead of globally

python -m venv homecraftenv

(Optional) If step 3 is followed, activate your virtual env. Check here to check commands depending on your OS. For Unix or macOS use

source homecraftenv/bin/activate

Clone this repo in your project folder.

git clone https://github.com/valerioarvizzigno/homecraft_vertex.git

Install requirements needed to run the app from the requirements.txt file

pip install -r requirements.txt

Install gcloud SDK. It is needed to connect to VertexAI APIs. docs here Follow the instructions at the link depending on your OS. If using Homebrew on macOS you can simply install it with

brew install --cask google-cloud-sdk

Init gcloud and follow the CLI instructions. You have to specify the working Google Cloud project

gcloud init

Authenticate the VertexAI SDK (it has been installed with requirements.txt). More info here

gcloud auth application-default login

Load the all-distillroberta-v1 ML model in you Elastic cluster via Eland client and start it. To run Eland client you need docker installed. An easy way to accomplish this step without python/docker installation is via Google's Cloud Shell.

git clone https://github.com/elastic/eland.git

cd eland/

docker build -t elastic/eland .

docker run -it --rm elastic/eland eland_import_hub_model --url https://<elastic_user>:<elastic_password>@<your_elastic_endpoint>:9243/ --hub-model-id sentence-transformers/all-distilroberta-v1 --start

Index general data from a retailer website (I used https://www.ikea.com/gb/en/) with Elastic Enterprise Search's webcrawler and give the index the "search-homecraft-ikea" name (for immediate compatibility with this repo code, otherwise change the index references in all homecraft_*.py files). For better crawling performance search the sitemap.xml file inside the robots.txt file of the target webserver, and add its path to the Site Maps tab. Set a custom ingest pipeline, named "ml-inference-title-vector", working directly at crawling time, to enrich crawled documents with dense vectors. Use the previously loaded ML model for inference on the "title" field as source, and set "title-vector" as target field for dense vectors.
Before launching the crawler, set mappings for the title-vector field on the index

POST search-homecraft-ikea/_mapping
{
  "properties": {
    "title-vector": {
      "type": "dense_vector",
      "dims": 768,
      "index": true,
      "similarity": "dot_product"
    }
  }
}

Start crawling.
Index the Home Depot products dataset into elastic.
Create a new empty index that will host the dense vectors called "home-depot-product-catalog-vector" (for immediate compatibility with this repo code, otherwise change the index references in all homecraft_*.py files) and specify mappings.

PUT /home-depot-product-catalog-vector 

POST home-depot-product-catalog-vector/_mapping
{
  "properties": {
    "title-vector": {
      "type": "dense_vector",
      "dims": 768,
      "index": true,
      "similarity": "dot_product"
    }
  }
}

Re-index the product dataset through the same ingest pipeline previously created for the web-crawler. The new index will now have vectors embedded in documents in the title-vector field.

POST _reindex
{
  "source": {
    "index": "home-depot-product-catalog"
  },
  "dest": {
    "index": "home-depot-product-catalog-vector",
    "pipeline": "ml-inference-title-vector"
  }
}

Leverage the BigQuery to Elasticsearch Dataflow's native integration to move a sample e-commerce dataset into Elastic. Take a look ad tables available in this dataset withih BigQuery explorer UI. Copy the ID of the "Order_items" table and create a new Dataflow job to move data from this BQ table to an index named "bigquery-thelook-order-items". You need to create an API key on the Elastic cluster and pass it along with Elastic cluster's cloud_id, user and pass to the job config. This new index will be used for retrieving user orders.
Set up the environment variables cloud_id (the elastic CloudID - find it on the Elastic admin console), cloud_pass and cloud_user (Elastic deployments's user details) and gcp_project_id (the GCP project you're working in). This variables are used inside of the app code to reference the correct systems to communicate with (Elastic cluster and VertexAI API in your GCP project)

export cloud_id='<replaceHereYourElasticCloudID>'
export cloud_user='elastic'
export cloud_pass='<replaceHereYourElasticDeploymentPassword>'
export gcp_project_id='<replaceHereTheGCPProjectID>'

Fine-tune text-bison@001 via VertexAI fine-tuning feature, using the fine-tuning/fine_tuning_dataset.jsonl file. This will instruct the model in advertizing partner network when specific questions are asked. For more information about fine-tuning look at these docs
Run streamlit app

streamlit run homecraft_home.py

Sample questions

---USE THE HOME PAGE FOR BASE DEMO---

Try queries like:

"List the 3 top paint primers in the product catalog, specify also the sales price for each product and product key features. Then explain in bullet points how to use a paint primer". You can also try asking for related urls and availability --> leveraging private product catalog + public knowledge
"could you please list the available stores in UK" --> --> it will likely use (crawled docs)
"Which are the ways to contact customer support in the UK? What is the webpage url for customer support?" --> it will likely use crawled docs
Please provide the social media accounts info from the company --> it will likely use crawled docs
Please provide the full address of the Manchester store in UK --> it will likely use crawled docs
are you offering a free parcel delivery? --> it will likely use crawled docs
Could you please list my past orders? Please specify price for each product --> it will search into BigQuery order dataset
List all the items I have bought in my order history in bullet points

---FOR A DEMO OF FINE-TUNED MODEL USE "HOMECRAFT FINETUNED" WEBPAGE---

Try "Anyone available at Homecraft to assist with painting my house?". Asking this question in the fine-tuned page should suggest to go with Homecraft's network of professionals

Asking the same to the base model will likely provide a generic or "unable to help" answer.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.streamlit		.streamlit
Additional sources		Additional sources
fine-tuning		fine-tuning
pages		pages
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
homecraft_home.py		homecraft_home.py
load_embedding_model.ipynb		load_embedding_model.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Homecraft Retail demo with Elastic ESRE and Google's GenAI

Configuration steps

Sample questions

About

Releases

Packages

Languages

License

valerioarvizzigno/homecraft_vertex

Folders and files

Latest commit

History

Repository files navigation

Homecraft Retail demo with Elastic ESRE and Google's GenAI

Configuration steps

Sample questions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages