Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow preads from non-owner processes #9250

Closed
wants to merge 1 commit into from

Conversation

nickva
Copy link
Contributor

@nickva nickva commented Dec 26, 2024

At the C API level pread [1] can be called from multiple threads. In Erlang, however, all the calls must be made from a single owner process.

In a busy concurrent environment that means all operations for a file descriptor must be serialized, and could pile up in the controller's mailbox having to wait for potentially slower operations to complete.

To fix it, add the ability to make raw concurrent pread calls from Erlang as well. File descriptor lifetime is still tied to the owner process, so it gets closed when owner dies. Other processes are only allowed to make pread calls, all others still require going through the owner.

Other file layers, like compression and delayed IO, still keep the previous behavior. They have their own get_fd_data/1 functions per layer which check controller ownership. Concurrent preads are not allowed in those layers.

In unix_prim_file.c the seek+read fallback would have required exposing a flag in Erlang in order to keep the old behavior since another process could see the temporarily changed position. However, before adding a new flag looked where pread might not be supported, and it seems most Unix-like OSes since Sun0S 5(Solaris 2.5) and AT&T Sys V Rel4 (so all modern BSD) seem to have it. So perhaps, it's safe to remove the fallback altogether and simplify the code? As a precaution kept a configure check with an early failure and a clear message about it.

This necessitates updating preloaded beam files so an OTP team member would have to take over the PR, if the idea is acceptable to start with. I didn't commit the beam files so any CI tests might not run either?

Issue: #9239

[1] https://www.man7.org/linux/man-pages/man2/pread.2.html

At the C API level pread [1] can be called from multiple threads. In Erlang,
however, all the calls must be made from a single owner process.

In a busy concurrent environment that means all operations for a file
descriptor must be serialized, and could pile up in the controller's mailbox
having to wait for potentially slower operations to complete.

To fix it, add the ability to make raw concurrent pread calls from Erlang as
well. File descriptor lifetime is still tied to the owner process, so it gets
closed when owner dies. Other processes are only allowed to make pread calls,
all others still require going through the owner.

Other file layers, like compression and delayed IO, still keep the previous
behavior. They have their own `get_fd_data/1` functions per layer which check
controller ownership. Concurrent preads are not allowed in those layers.

In unix_prim_file.c the seek+read fallback would have required exposing a flag
in Erlang in order to keep the old behavior since another process could see the
temporarily changed position. However, before adding a new flag looked where
pread might not be supported, and it seems most Unix-like OSes since Sun0S
5(Solaris 2.5) and AT&T Sys V Rel4 (so all modern BSD) seem to have it. So
perhaps, it's safe to remove the fallback altogether and simplify the code? As
a precaution kept a configure check with an early failure and a clear message
about it.

This necessitates updating preloaded beam files so an OTP team member would
have to take over the PR, if the idea is acceptable to start with. I didn't
commit the beam files so any CI tests might not run either?

Issue: erlang#9239

[1] https://www.man7.org/linux/man-pages/man2/pread.2.html
Copy link
Contributor

github-actions bot commented Dec 26, 2024

CT Test Results

    3 files    141 suites   50m 11s ⏱️
1 598 tests 1 549 ✅ 49 💤 0 ❌
2 307 runs  2 238 ✅ 69 💤 0 ❌

Results for commit f0fb6e7.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@nickva
Copy link
Contributor Author

nickva commented Dec 26, 2024

Not sure about this approach and would like some feedback from the OTP team, so I didn't add any tests or documentation yet.

  • I suspect file_handle_wrapper state machine logic might need updated?

  • Could this approach work for pwrites as well? With the read buffer reset on pwrite calls, it might be tricky, but I saw there was a read buffer try_lock call, maybe something can be done there too...

So far just tried a simple shell check:

(concurrent-preads) % ./bin/cerl
Erlang/OTP 28 [DEVELOPMENT] [erts-15.2] [source-f4caf91470] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1]

1> {ok, Fd} = file:open("README.md", [raw, read, binary]).

{ok,{file_descriptor,prim_file,
                     #{handle => #Ref<0.2159425938.1955201055.257383>,
                       owner => <0.87.0>,
                       r_buffer => #Ref<0.2159425938.1955201036.257561>,
                       r_ahead_size => 0}}}

2> catch file:pread(Fd,2,3).
{ok,<<"[Er">>}

3> spawn(fun() -> erlang:display(file:pread(Fd,2,3)) end).
{ok,<<"[Er">>}
<0.91.0>

4> file:close(Fd).
ok

5> catch file:pread(Fd,2,3).
{error,einval}

6> spawn(fun() -> erlang:display(file:pread(Fd,2,3)) end).
{error,einval}
<0.95.0>

@jhogberg
Copy link
Contributor

Thanks for the PR!

  • I suspect file_handle_wrapper state machine logic might need updated?

Definitely. It's not safe if there can be more than one operation in flight. If there is a concurrent operation of any kind (even "safe" ones like fstat or pread), closing must be delayed until the operation returns lest we risk races causing the wrong file descriptor to be used.

Unfortunately I have to close this PR since we can't accept it; the file module interface is too much of a mess to carve out more exceptions like this. However, I've got a major refactor planned in the backlog that will address the underlying problem by allowing all thread-safe calls to be made concurrently. The idea is to introduce a new interface altogether (working name fs) where the old stuff like compressed files, "file owners," remote files, list mode, and so on aren't a thing, and platform-specific things will be broken out into platform-specific modules rather than trying to cram everything under the same hood. I'll lobby for bumping the priority of this work :-)

@jhogberg jhogberg closed this Dec 27, 2024
@nickva
Copy link
Contributor Author

nickva commented Dec 30, 2024

Thanks for taking a look, John!

Yeah it makes sense about closing it. I didn't want to start messing with the state machine, it looked like it needed expanding with a new set of transitions, and I obviously didn't know what I was doing :-)

However, I've got a major refactor planned in the backlog that will address the underlying problem by allowing all thread-safe calls to be made concurrently.

I like the idea of a new interface. It seems we should be able to take advantage of inherent thread safety of some of the posix api calls. For us (Apache CouchDB) it's not just theoretical improvement, we see frequent process mailbox back-ups for the one single Fd owner. If this bumped the internal priority of the issue even a tiny bit, we're very appreciative!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants