-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for GNU Make jobserver (alternative implementation) #104
base: master
Are you sure you want to change the base?
Conversation
4aabf9c
to
bdf99fa
Compare
(rebased on top of #106, which provides "give back tokens on signals" behavior for free) |
Sorry to be very explicit, but I would like a clear answer on whether the project is still maintained. |
Extract it from jobwork() so that build() can call it on a signal. Signed-off-by: Paolo Bonzini <[email protected]>
Keep the system clean by propagating SIGTERM to all children, and by not starting new jobs on both SIGTERM and SIGINT. The only tricky bit is that previously fd[i].revents was used to skip both jobs that are not in use and jobs that did not have output; that's because negative file descriptors do not cause POLLNVAL and therefore fd[i].revents is zero for inactive jobs as well. But because all jobs must be killed, build() now has to check fd[i].fd == -1 explicitly. While at it, also clean up jobdone() by clearing job[i].edge; it's not nice to leave a dangling pointer in the jobs array, even if it's harmless. Signed-off-by: Paolo Bonzini <[email protected]>
GNU Make has a neat feature called the jobserver protocol, where the top-level Make can allocate a specific number of job slots, and child makes can take slots to do work in. This was designed to stop the parallelisation problem where a top-level make -j10 may potentially spawn 10 separate sub-makes all with -j10 so there's now 100 parallel jobs. However, it's also useful for resource control in systems which build multiple pieces of software at once. For example, Bitbake can build N different pieces of software at once, and each of those is passed a -jM flag. If each of these N tasks is compiling then thats's N*M jobs so you don't want N or M to be too high, but if only 1 of N is building then you want M to be high. With the job server protocol there are N slots in total for all sub makes, so you can control the resource utilisation more accurately. By supporting the jobserver protocol instead of just -j, Samurai can join in the resource pooling and builds can be more efficient. Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
return true; | ||
|
||
got_token = tokenread(local_rfd); | ||
if (!got_token && !e->reserve) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about the following case:
- samurai tries to run a job A, and does not get a token.
- tokenget signals to the jobserver thread that it wants a token, and marks the reserve flag of A.
- samurai enters poll
- Some other job finishes and unblocks job B, adding it to the head of the
work
list. - The jobserver thread receives a token and writes it to the pipe
- samurai tries to run job B,
tokenget
returns the token just written. - samurai tries to run job A.
tokenget
sees that the reserve flag is already set so does not request another job. - We don't start job A, even if there is another token available from the jobserver.
To solve this, it's almost sufficient to change this condition to if (!e->reserve)
, but I think this doesn't work due to the free token. If you changed the jobserver wait condition to while (pending_edges <= 1 && !done)
as well, that might do the trick.
Another option would be to keep track of the work
list tail in build.c to make it operate FIFO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right! Making it FIFO is the simplest fix. Maybe another is adding a variable requested_tokens that is local to build.c.
I will take a look once I get round to testing and submitting the prerequisite SIGTERM patches.
Alternative implementation to #94.
The main advantage is that the integration with the
build()
event loop is very clean, as it simply uses a pipe to signal the availability of tokens. Interacting with the job server is entirely embedded within a newtoken.c
file that implements a simple API:and on top of this, the integration is about 20 lines of code.
On the other hand
token.c
uses pthreads, which perhaps could be considered less appealing. Waiting for reviews. :)