Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task issues #71

Open
PhuriousGeorge opened this issue Feb 22, 2021 · 26 comments
Open

Task issues #71

PhuriousGeorge opened this issue Feb 22, 2021 · 26 comments
Labels
bug Something isn't working documentation Improvements or additions to documentation question Further information is requested

Comments

@PhuriousGeorge
Copy link

Currently running latest, but had similar issues in v0.9

As mentioned elsewhere, I've got a massive list of channels to add and tasks keep seeming to freeze for me. Seems to be two different issues.

  1. Freezing: I had 800+ tasks in my queue listed as "immediate", but nothing was being executed.
  2. Task regeneration issue? I reset tasks to see if I could kick them back off and now I have 20 sources with only 3 tasks. 2 are playlists and the 3rd is a channel I just added.

The 2 playlists were queried but when I added the channel, it is again listed as "running" with nothing being executed.

@meeb
Copy link
Owner

meeb commented Feb 22, 2021

Yeah there's still possibly an issue creating tasks with signals which I can't identify as it seems intermittent and I've never seen it myself in testing or my own (admittedly smaller) deployments. The "reset tasks" button can only partially work due to the timeouts in the web interface, the "reset tasks" has to trigger every signal again in a loop, so basically a save() on every media item. If you have a several thousand this can take over 60 seconds and just time out.

Can you try the reset tasks CLI script that's in :latest? Does the same job as "reset tasks" button but has no time limit and should be more reliable.

https://github.com/meeb/tubesync/blob/main/docs/reset-tasks.md

@PhuriousGeorge
Copy link
Author

PhuriousGeorge commented Feb 22, 2021

That's why I made the log request :). I've been trying to troubleshoot and since I've only had a small portion of my list added, I've wiped the DB and started fresh twice but keep encountering the same issue. I've been adding smaller channels this time around instead of the 1000+ vid channels and just hit the task problem again.

Ran the CLI command and ran into an error:

Rescheduling Index media from source "phuriousgeorge",
Traceback (most recent call last):,
  File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute,
    return self.cursor.execute(sql, params),
  File "/usr/local/lib/python3.7/dist-packages/django/db/backends/sqlite3/base.py", line 413, in execute,
    return Database.Cursor.execute(self, query, params),
sqlite3.OperationalError: database is locked,
,
The above exception was the direct cause of the following exception:,
,
Traceback (most recent call last):,
  File "/usr/local/lib/python3.7/dist-packages/background_task/tasks.py", line 43, in bg_runner,
    func(*args, **kwargs),
  File "/app/sync/tasks.py", line 183, in index_source_task,
    media.save(),
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 754, in save,
    force_update=force_update, update_fields=update_fields),
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 792, in save_base,
    force_update, using, update_fields,,
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 873, in _save_table,
    forced_update),
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/base.py", line 926, in _do_update,
    return filtered._update(values) > 0,
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/query.py", line 803, in _update,
    return query.get_compiler(self.db).execute_sql(CURSOR),
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/sql/compiler.py", line 1522, in execute_sql,
    cursor = super().execute_sql(result_type),
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/sql/compiler.py", line 1156, in execute_sql,
    cursor.execute(sql, params),
  File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 66, in execute,
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute),
  File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers,
    return executor(sql, params, many, context),
  File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute,
    return self.cursor.execute(sql, params),
  File "/usr/local/lib/python3.7/dist-packages/django/db/utils.py", line 90, in __exit__,
    raise dj_exc_value.with_traceback(traceback) from exc_value,
  File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute,
    return self.cursor.execute(sql, params),
  File "/usr/local/lib/python3.7/dist-packages/django/db/backends/sqlite3/base.py", line 413, in execute,
    return Database.Cursor.execute(self, query, params),
django.db.utils.OperationalError: database is locked,

It hangs for quite awhile, then throws Failed to retrieve tasks. Database unreachable. Eventually it seems to give up and tosses the above error in the CLI response.

I attempted the CLI command again, received the same as above except I'm still waiting for execution completion. It appears completely frozen again with Failed to retrieve tasks. Database unreachable. as the last thing exported in the container log.

Edit: after 5 minutes of hanging with only 5 more reports of Failed to retrieve tasks. Database unreachable., I interrupted the execution.

@meeb
Copy link
Owner

meeb commented Feb 22, 2021

Ah, that's a db write occurring in the container and then via the CLI script at the same time which SQLite can't handle.

Try stopping the container with docker stop tubesync then try your docker run... but with:

docker run \
  -ti \
  -v /path/to/your/config:/config \
  -e PUID=1000 \
  -e GPID=1000 \
  --entrypoint=/usr/bin/python3 \
  ghcr.io/meeb/tubesync:latest \
  /app/manage.py reset-tasks

Edit as required.

That will run the reset-tasks command on its own without the usual parent container potentially locking the db at the same time. If that completes without error you should be able to docker start ... the container again normally (or docker run ... if you removed it) and your missing tasks should be recreated.

You just need reset-tasks on the CLI to run with exclusive write access to the SQLite database.

@meeb
Copy link
Owner

meeb commented Feb 22, 2021

This should probably be in the advanced docs as well thinking about it.

@meeb meeb added bug Something isn't working documentation Improvements or additions to documentation question Further information is requested labels Feb 22, 2021
@PhuriousGeorge
Copy link
Author

How long should it run? I've not gotten any stdout messages since Resetting all tasks.... It's been running for 15 minutes. Is it just establishing tasks or actually running them all?

@meeb
Copy link
Owner

meeb commented Feb 22, 2021

It's not indexing or doing any requests work, it's just looping over all the media items one by one and recreating all the expected tasks so if you have a very large number of media items it might take quite a while. It's safe to just leave running until it finishes with a "Done" though. It takes about 30 seconds with my 5000 test media items so I guess you have quite a lot more than that.

@meeb
Copy link
Owner

meeb commented Feb 22, 2021

https://github.com/meeb/tubesync/blob/main/tubesync/sync/management/commands/reset-tasks.py

... if you're curious as to the steps. It's slow because to trigger the required signals to create tasks you have to save() on every Source which in turn loops every Media object linked to the Source and calls that objects save() resulting in a lot of queries.

@PhuriousGeorge
Copy link
Author

All together took 36 minutes to execute. I currently have 3318 media items indexed and 1002 tasks were created. Can't check anything further from the task page because 500 errors being thrown from workers chewing through that queue and locking the DB up :P

@PhuriousGeorge
Copy link
Author

Seems there's still some workers tripping over each other?

2021-02-22 09:56:31,334 [tubesync/INFO] Indexed media: 45-drives / tTFFMyY_N5w
Failed to retrieve tasks. Database unreachable.
2021-02-22 09:56:32,141 [tubesync/INFO] Indexed media: 45-drives / k5ftX9Iwvvg
2021-02-22 09:56:32,141 [tubesync/INFO] Deleting completed tasks older than 7 days (run_at before 2021-02-15 15:56:32.141614+00:00)
Failed to retrieve tasks. Database unreachable.
Failed to retrieve tasks. Database unreachable.,
Failed to retrieve tasks. Database unreachable.,
Failed to retrieve tasks. Database unreachable.,
Failed to retrieve tasks. Database unreachable.,
Failed to retrieve tasks. Database unreachable.,
Failed to retrieve tasks. Database unreachable.,

It then recovers and moves on to the next channel but keeps tossing the DB errors every 10 tasks or so

@meeb
Copy link
Owner

meeb commented Feb 22, 2021

Is your TUBESYNC_WORKERS env var set to 1? If not lower it to 1 and see if that helps. If it is already 1 or unset, then no idea in theory that shouldn't be possible given that's a single blocking worker thread performing single threaded writes. This should probably fix itself after work on #40 goes ahead as well. I'm assuming your /config dir isn't on an NFS or SMB mount or other usual culprits for SQLite write access issues of course.

@PhuriousGeorge
Copy link
Author

I had upped workers to 4 on a previous DB and had issues, but this iteration is purely default. No NFS/SMB. Just a SSD passed-through to a VM with a directory passed to Docker.

I had read that issue/enhancement previously and I guess I missed the Open tag lol. At this point, I can only assume it's the worker in-fighting that causes the task freeze-ups.

@meeb
Copy link
Owner

meeb commented Feb 22, 2021

Yes. that "Failed to retrieve tasks. Database unreachable.," is thrown by django-background-tasks failing to lock a task, likely because another worker acquired the task first in a race condition or the SQLite database was locked for writing. It's a generic error:

https://github.com/arteria/django-background-tasks/blob/master/background_task/tasks.py#L254

Since the change to flat indexing which is much faster you really probably don't need more than one worker, so setting them to 1 is probably fine.

@meeb
Copy link
Owner

meeb commented Feb 22, 2021

After some testing with 50k media items generated SQLite can't manage the concurrent writes required properly. This could be mitigated with very careful thread management, but it's probably easier to just use the right tool for the job. Will work on it in #72 - I would guess your very large deployment will continue to be twitchy until that's complete unfortunately.

@PhuriousGeorge
Copy link
Author

Gotcha, I was wondering about that myself. I'm nowhere near 50k, still at 3400 media items being tracked over 20 sources and hitting the performance described above. Just wait until I add TED with 100k+ videos itself

@Code-Slave
Copy link

it seems from my testing once i get into the 2kish media items it starts hitting this hard

@meeb
Copy link
Owner

meeb commented Feb 24, 2021

I have 5-6k media items in my personal install with a single worker and SQLite with no issues what so ever, but from the issues I can see it's common to find errors in some deployments as well. I'm not entirely sure how people are finding these database locking issues (assuming their SQLite databases aren't on network shares etc.) given I can't trigger them with a single worker even when trying to find them, but generally I expect the advice in the future is just going to be "got database locking issues? migrate to postgres" or similar. It'll be fixed before TubeSync comes out of pre-release at any rate. Thanks for being the early testers.

@Code-Slave
Copy link

@meeb This is a really great app which is why i keep banging on it. Sure I do all the same stuff using scripts but thats not very high in the gf acceptance factor and just ease of use.

@meeb
Copy link
Owner

meeb commented Feb 24, 2021

Thanks! Feel free to keep mentioning issues if you find them that's what the pre-release is for, providing you're fine with it being occasionally broken until v1.0.

@Code-Slave
Copy link

Im totally ok with that thats why its alpha

@gunnyjenkins
Copy link

gunnyjenkins commented Mar 8, 2021

Currently experiencing a similar issue. I have 7300 videos over 26 sources downloaded before I ran into issues hitting my download cap on YouTube's side. Had it happen before, and shutting down TubeSync for a few days and coming back and resuming seemed to fix that issue. So I decided to do that again, but this time when I came back and started things back up it shows that my tasks are supposed to go into effect immediately, but nothing is happening. It has just stayed the same for the past few hours. I've read the thread and know that you're aware of the issue. I was just posting that I was having a similar issue since you said you haven't been able to recreate it.
tubeee

@meeb
Copy link
Owner

meeb commented Mar 9, 2021

Thanks for the anecdotal confirmation of the issue. I suspect this is going to fix itself once a bunch of other issues get worked on (like #93 and #72 etc.) . Obviously setting up TubeSync somewhere with 50 large channel sources and waiting until something breaks isn't that efficient a method of testing so I've still not directly seen this myself yet :)

@PhuriousGeorge
Copy link
Author

That's what I'm here for haha!

@amandle
Copy link

amandle commented Mar 30, 2021

Confirming that this is happening for me as well. I have 767 tasks scheduled to run immediately, and no running tasks. I have tried changing various worker values, 1, 2, and 4. But it doesn't seem to have changed anything. I have 34 sources, and am running on docker on unraid. Thanks for building this, it seems super useful!

@ExodusC
Copy link

ExodusC commented Aug 8, 2021

Wanted to add that I've experienced this a few times as well, running on a deployment that had both an external DB server (MariaDB) and TUBESYNC_WORKERS=1 set up before ever adding any media.

As discussed earlier, I think each time I actually triggered the issue by trying to use the Reset Tasks button in the GUI. It seems like after the timeout, some sources get into the broken state where they no longer even generate a task to index, etc.

I've been able to resolve it each time by getting onto a local terminal session in the container and running python3 manage.py reset-tasks as suggested above.

Since it seems like there's quite a few of us running TubeSync with large amounts of source/media, and task-related issues seem to be the biggest problem I've had, these are my suggestions based on my experience:

  • Use an external database server (alleviated all the previous issues I had with SQLite & DB lockups; I have been using MariaDB in another container as I encountered the issues with postgres described here: Error "set_session cannot be used inside a transaction" using postgres (tag: latest) #135)
  • Only run reset-tasks in the container directly (rather than the Reset Tasks GUI option)
  • Allocate enough memory to the container (I think I had some weird behavior with lower memory limits set; There is some discussion on high memory usage in: Possible memory leak. #103)
  • Environment variable TUBESYNC_WORKERS set to 1 (Had it set based on this thread, maybe not necessary with an external DB?)
  • Run the database and TubeSync /config directory on solid state storage (this dramatically improved TubeSync performance with the low latency from the SSDs)

Hopefully in the future these tweaks won't be necessary, but I thought I'd share my experience for anyone playing with the current builds using lots of sources with a lot of media.

@meeb
Copy link
Owner

meeb commented Aug 8, 2021

Thanks for the report and testing. I'm fiddling with a full migration of the background tasks system to Celery but it's a big change and taking a while, primarily around making sure I don't break everyone's installs with a botched automated data migration. That Postgres issue should be solved sooner though! There is, alas, very little that can be done about quite high memory usage during indexing large channels as this does cause upstream libraries the worker uses to store a lot of information in memory, however the workers do free this memory after indexing when using the new workers so that should improve to burst usage not permanent usage eventually.

@SmartPhoneLover
Copy link

Confirming that this is happening for me as well. I have 767 tasks scheduled to run immediately, and no running tasks. I have tried changing various worker values, 1, 2, and 4. But it doesn't seem to have changed anything. I have 34 sources, and am running on docker on unraid. Thanks for building this, it seems super useful!

Did you manage to run the 'reset-tasks' cmd (CLI) on unRAID? Just asking, because I'm not really sure how to call it from the container without using the webGUI button (reset tasks).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants