connect an esp32 cam to llm and back again
The project connects an esp32 cam to an LLM, which takes as input the prompt "describe this picture?" together with the image, and then returns a description of the image. The returned description is then transformed into speech. The LLM is running locally on my machine. I'm using the ollama library with LLAVA model. The LLAVA open source model is good for doing image analysis. I'm using an MSI laptop with an NVIDIA 4070 GPU, which has 8GB of vram. The llava model can be installed directoy with ollama.
Thanks to pcbway for the pcbs. https://www.pcbway.com
- llm_2_esp32_cam.py: this file is running on my local maching.
- esp32_2_llm: this contains the test internet radio file from the audio esp32-i2s library. Try to get this working first to troubleshoot any i2s difficulties.
- esp32_2_llmv4: this contains the code running on the eps32 cam that communicates with the llm.
- fritzing: fritzing file and gerbers for pcb.
- esp32 cam: I'm using the AI thinker.
- MAX 98357: Audio amplifier.
- speakers: 8 ohm or 4 ohm speakers. It's worth getting good ones. Cheaper ones are at aliexpress (https://www.aliexpress.com/item/1005006056014552.html). More expensive ones are at Amazon (https://www.amazon.com.au/dp/B01LN8ONG4).
- FTDI adapter: Need this to flash the esp32 cam if you're using the AI thinker.
- pcb or breadboard: I used both. I built on the breadboard first and then put everything on a pcb.
- oled: I used a 128 x 64 oled.
- 18650 battery
- 18650 WEMOS battery shield: see https://www.amazon.com.au/Be-Your-Mind-Expansion-Compatible/dp/B0CPDD985S for example.
- push button
- rocker switch
- jumper cables