Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semaphore creation fails due to no space left #18

Closed
brownp2k opened this issue Dec 22, 2020 · 17 comments
Closed

Semaphore creation fails due to no space left #18

brownp2k opened this issue Dec 22, 2020 · 17 comments

Comments

@brownp2k
Copy link
Contributor

We experienced Apache being killed (SIGSEGV), apparently due to this:
[Sun Dec 20 03:45:03.522921 2020] [oauth2:error] [pid 8085] oauth2_ipc_sema_post_config: sem_open() failed to create named semaphore /zzo-sema-8085.0x564b89a996e0: No space left on device (28)

It looks like oauth2_ipc_sema_post_config only frees the name before creating a new semaphore.

From the looks of it, a new semaphore file is created at least every 10 minutes, and there's 5 associated "sem.zzo" files created per main semaphore file. I don't see any old files getting cleaned up.

@zandbelt
Copy link
Member

which platform are you on?

@brownp2k
Copy link
Contributor Author

CentOS 7

This is running a source build that contains the fix you put in for handling mod order.

@zandbelt
Copy link
Member

oauth2_ipc_sema_post_config is not supposed to be called twice; what threading model (mpm) are you using?

@brownp2k
Copy link
Contributor Author

Server MPM:     prefork
  threaded:     no
    forked:     yes (variable process count)

@zandbelt
Copy link
Member

can you try with worker or event for comparison?

zandbelt added a commit that referenced this issue Dec 23, 2020
to prevent multi-process crash; see #18

Signed-off-by: Hans Zandbelt <[email protected]>
@zandbelt
Copy link
Member

I also applied a should-be-fix and tagged 1.4.0.1

@brownp2k
Copy link
Contributor Author

I'm checking to see if it's possible to run with worker or event as it isn't a machine I control.

@zandbelt
Copy link
Member

you can also skip that and test the updated master of liboauth2

@brownp2k
Copy link
Contributor Author

Have been running with liboauth2 1.4.0.1 for about 3 hours now, and du -hs /dev/shm is showing a 0 size. Running ls -lsah /dev/shm currently shows 33 zzo-shm-* files that are all 7.9M in size. And finally, df -h shows only 40K used.

I'll check again in the morning, but it seems that 1.4.0.1 has fixed the issue.

@brownp2k
Copy link
Contributor Author

It was running the next morning (Dec 24) but upon checking httpd this morning (Dec 28) it appears that it crashed due to SIGSEGV yesterday morning at 3am. Checking /dev/shm shows 788 files that are all 7.8M in size, yet du shows 0 and df shows 40K. Nothing in the log, and nothing in ABRT like previous crashes.

@zandbelt
Copy link
Member

zandbelt commented Dec 28, 2020

ow, can you try to make it core dump or run it in gdb?
or maybe share your setup with me (DM) so I can try and run/reproduce

@brownp2k
Copy link
Contributor Author

Unfortunately, this is happening on our production machine so I can't readily share that setup. I've been trying to reproduce the issue in a CentOS 7 VM and haven't had any luck yet...I'm unsure whether it's an Apache-specific setup thing that I'm just not triggering in the same way or something else that is more machine/system specific.

@brownp2k
Copy link
Contributor Author

brownp2k commented Dec 29, 2020

After some more digging, I think the 3am "crash" on Dec 27 was a red herring. Log rolling activated, which triggered a graceful restart, which in turn triggered the "graceful restart resource issue" mentioned here: OpenIDC/mod_oauth2#7 (comment)

However, the 788 files in /dev/shm are all still there, but maybe that isn't as bad as it seems since df and du don't register them?

@brownp2k
Copy link
Contributor Author

A possibly related issue I ran into this morning is that Apache failed to restart after performing a shared memory cleanup:
[Sat Jan 23 02:46:43.778903 2021] [core:emerg] [pid 26292] (28)No space left on device: AH00023: Couldn't create the rewrite-map mutex

Googling lead to:
https://serverfault.com/questions/991946/no-space-left-on-device-ah00023-couldnt-create-the-mpm-accept-mutex-when-re?newreg=460432d6a1dd4d8d98adc3daecead8e1

Clearing out the listed apache semaphores based on that link's advice allowed Apache to restart without failing.

@zandbelt
Copy link
Member

ok, thanks for the additonal info, hope to get to the bottom of this soon

zandbelt added a commit that referenced this issue Jan 30, 2021
@zandbelt
Copy link
Member

can you try 7de0b49 ?

@brownp2k
Copy link
Contributor Author

brownp2k commented Feb 1, 2021

A quick test this morning shows that 7de0b49 allows httpd to be restarted without any apparent issues, and it also appears that zzo-shm-* files are no longer being created in /dev/shm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants