-
Notifications
You must be signed in to change notification settings - Fork 707
fix(cam_hal): replace recursive cam_take() with bounded loop prevents stack-overflow in cam_task #758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(cam_hal): replace recursive cam_take() with bounded loop prevents stack-overflow in cam_task #758
Conversation
@Fexiven a review would be appreciated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses a stack overflow issue in the camera driver by replacing the recursive cam_take() calls with a bounded loop.
- Replaces recursive frame fetching with a loop that tracks elapsed time.
- Introduces GDMA reset attempts (capped at 3) for ESP32-S3 and adjusts behavior accordingly.
@RubenKelevra please have a look at the copilot suggestion, this error in CI: https://github.com/espressif/esp32-camera/actions/runs/15981710752/job/45109710238?pr=758#step:4:824 LGTM otherwise |
I'd like to test it, but I'm not sure how to update the esp32_camera driver in ESPHome to use the version from your pull request. I tried doing it like this: esphome/esphome@a72405c — but it doesn't seem like anything actually changed. Could you please guide me on how to properly test your version? |
@turenkomv that is because the PR has not yet been merged. When that happens and new version is released, you will be able to grab it from the component manager |
– prevents stack-overflow in cam_task The old cam_take() used recursion to retry if * no JPEG EOI was found or * a NULL frame pointer was returned (GDMA glitch on ESP32-S3). Under heavy loss conditions this could overflow the cam_task stack and reboot the whole system. * Re-wrote cam_take() as a single while-loop – stack depth is now constant and independent of retry count. * Added strict timeout tracking (`remaining = timeout - elapsed`); function can never block longer than the caller’s timeout. * ESP32-S3 only * capped GDMA reset storms to 3 attempts (`MAX_GDMA_RESETS`) * logs one “giving up” warning, then yields (`vTaskDelay(1)`) to avoid busy-spin after hardware is wedged. * Non-S3 targets * emit early `ESP_LOGW` when a NULL frame ever appears, then yield one tick per loop to prevent CPU thrash. * Maintained existing JPEG-EOI trimming and YUV→GRAY memcpy paths; behaviour unchanged on successful frames. * Inline comment links to esp32-camera commit 984999f / issue espressif#620 for future context.
887b719
to
c4204d3
Compare
Woops, good catch, thank you! |
Thanks for your report, but I don't an error in your configuration. Note that this should not make the camera correctly record frames (yet), but only solve the crash by stack overflow. So it's expected that you see still "EV‑EOF‑OVF" messages, but no crash. Edit: Maybe it doesn't like building from a branch, try this tag instead: https://github.com/RubenKelevra/espressif_esp32-camera/releases/tag/fix_cam-task-stack-overflow-0.1 |
I tried setting the tag fix_cam-task-stack-overflow-0.1, but nothing changed. After a variable number of messages, I still get the same error:
Is it possible that the new driver version isn't actually being pulled into the build? log:
config: esphome:
name: m5stack-camera-1
friendly_name: M5Stack Camera 1
external_components:
# - source: github://pr#8843
# components: [ esp32_camera ]
# refresh: 0s
- source:
type: git
url: https://github.com/turenkomv/esphome
ref: dev
components: [ esp32_camera ]
refresh: 0s
esp32:
board: esp32-s3-devkitm-1
flash_size: 16MB
framework:
type: esp-idf
version: latest
sdkconfig_options:
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ_240: y
#CONFIG_CAMERA_JPEG_MODE_FRAME_SIZE_AUTO: n
#CONFIG_JPEG_MODE_FRAME_SIZE_CUSTOM: y
#CONFIG_CAMERA_JPEG_MODE_FRAME_SIZE: '2097152'
psram:
mode: octal
speed: 80MHz
# Enable logging
logger:
baud_rate: 0
# Enable Home Assistant API
api:
encryption:
key: !secret api_encryption_key
ota:
- platform: esphome
id: ota_esphome
password: !secret ota_password
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
captive_portal:
esp32_camera:
name: M5Stack Camera 1
external_clock:
pin: GPIO11
frequency: 20MHz
i2c_pins:
sda: GPIO17
scl: GPIO41
data_pins:
- GPIO6 # D0
- GPIO15 # D1
- GPIO16 # D2
- GPIO7 # D3
- GPIO5 # D4
- GPIO10 # D5
- GPIO4 # D6
- GPIO13 # D7
vsync_pin: GPIO42
href_pin: GPIO18
pixel_clock_pin: GPIO12
reset_pin: GPIO21
resolution: FHD
jpeg_quality: 2
frame_buffer_count: 2
max_framerate: 3 fps
esp32_camera_web_server:
- port: 8080
mode: SNAPSHOT
light:
- platform: status_led
name: "Status LED"
pin:
number: GPIO14
inverted: true Also, a few times right after flashing, I got this error:
|
Yeah. There can't be a stack overflow happening with the patch, because there's no longer recursion happening. So the patch is not applied by ESPHome, for whatever reason. Try removing the override in esphome/components/esp32_camera/init.py:
Keep only the YAML override. Then clean and remove the folders |
With exactly same result 😔 |
@turenkomv Thanks. I'll investigate further where the second issue may be lying. |
@turenkomv let's continue the discussion here. I'm not very hopeful, that this is the fix for the issue, but it's worth a shot. |
Description
The camera driver’s ISR (
ll_cam_send_event
) writes incoming DMA/VSYNC events into a queue. When that queue overflows, it logs the message below and stops the camera:If a frame is incomplete (for example after such an overflow),
cam_take()
will recursively retry fetching a frame until it finds a JPEG end marker:Each recursive call consumes additional stack space. Because the camera task stack size defaults to only 2048 bytes:
repeated recursion after several “EV‑EOF‑OVF” messages can exhaust the stack of the
cam_task
. Once the guard bytes at the end of the task’s stack are overwritten, FreeRTOS reports a stack overflow, as seen in the logs reported by @turenkomv.So the stack overflow likely occurs when the camera generates bad frames (due to event queue overflow) and
cam_take()
repeatedly recurses while trying to recover.To fix this we switch to a loop instead:
remaining = timeout - elapsed
); function can never block longer than the caller’s timeout.MAX_GDMA_RESETS
)vTaskDelay(1)
) to avoid busy-spin after hardware is wedged.ESP_LOGW
when a NULL frame ever appears, then yield one tick per loop to prevent CPU thrash.Related
Fixes crash reported by @turenkomv here: esphome/esphome#8832 (comment)
Testing
The ESP32S I got seems to be less affected, but runs the new code fine: I've patched the version 2.0.15 currently used in ESPHome with my changes.
@turenkomv would you be so kind and would test this on your hardware?
Checklist
Before submitting a Pull Request, please ensure the following: