-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add additional labels #472
Comments
Can you please include the specific labels used? It will be easier to discuss with a concrete example. |
As mentioned this might occur no matter if the runner are registered on org or repo level. Background: We register the runners now on org level - means those are visible to all repositories the github app is configured for in an organization. The runner has label [self-hosted, large-runner] We trigger workflow A in repository A using workflow dispatch. In the meantime another workflow B that listened for a runner already for 15 hours having the label [self-hosted, large-runner] takes over this runner. I can see this directly in the cloudwatch logs because in this case the workflow job failed and returned Job FrontendDeployment is only availabe in repository B for workflow B. |
That would be one of the organization level registration cons: You can bring up more runners to take over the stolen ones by creating a new step function execution. Open an existing one from the UI and hit the New Execution button. I'm still not sure what you originally meant with the labels. Where would the label be added to resolve this? |
I assume you can have this issue no matter if it`s org level or repo runner. Here`s a sample for a solution proposal:
This would mean that runner labels and workflow labels can be different and you can attach additional workflow labels. I`ll try to create a PR and test this to see if this could work. |
That won't prevent other repositories from using |
well it`s not like people steal runners by intention. And the You could use multiple tags and choose whatever you want. |
btw if you know a better solution for this dilemma, please let me know :) |
Exactly. Have you considered creating multiple providers all using the same settings and image builder but with different labels? That feels like basically what you're asking for. Each runner will start with exactly the labels you want and for just the repositories you want.
Repository level registration is the only thing I found. I do wish GitHub would let us aim specific runners at specific jobs, but they have no such option. |
No chance. Too much repos at all. This is no fun even when you have more than 10 repos.
What I struggle with is the Another optionAdd a schedule feature to start additional runners on a regular base by
The @kichik please give me feedback |
Wouldn't you have the same problem with your proposed solution? You'd have to specify all those labels somewhere. Maybe your CDK code can automatically generate it using The schedule option is something I have considered often, for many reasons. I am still not sure exactly what I want it to look like or if it should even be a schedule and not some kind of monitoring system for stolen/missing runners. You can already define a schedule manually, if you need to. I have done something similar for one of my installations.
|
It doesn`t happen all the time right. And for most of the users it`s quite ok how it works even with org runners.
Hmm that`s a nice idea. But ok I can`t do this statically in cdk code for all repos. I would get lost. Create schedule to trigger a lambda. |
You can create a provider just for those special cases then. Isn't that basically what you were asking for? What am I missing?
Since you're registering on organization level, the |
Would you be interested to integrate this construct in your solution ? Should I add a PR or should I create a separate construct ? |
I do want something like this one day, but I don't think we have all the details figured out yet. For example, when using ECS with a limited ASG, a job might be pending for >10 minutes just because the cluster is busy. If we keep submitting more jobs, it may only make the situation worse. I do like that it doesn't go off just by the webhook. I have seen cases where the webhook was just not called and so runners were missing. |
Please do share the code, if you can. I think I may have misunderstood the matching you were doing between jobs and the existing state machine. It might already cover the problem I mentioned. |
@kichik hope you have seen my invitation - created a private repo for this. |
I did. Thanks. |
After more than week of running the scheduler step function I found out there a clear correlation between the webhook lambda function error message Detailed error message in cloudwatch:
|
The name uses GitHub webhook delivery id to remain unique.
Do you maybe have the same webhook hooked up more than once somehow? |
Ok then I assume it might be the apigw/lambda retry mechanism... |
I assume that this really only happens when the apigw (my internal customlambdaaccess implementation) does a retry... seems like this occurs for 1 % of the runner requests and it`s always the same correlation between |
Can you tell what's the initial failure? It makes sense for the second webhook retry to fail with "the stepfunction alreadyexists message" and return 502. But why would the first one fail and require a retry? Is it a timeout maybe? |
you`re right should investigate in this :) will do ... timeout could be an option yes |
Hey @kichik,
we have one issue with runners on org level (I assume as well on repo level).
When multiple jobs run in parallel it seems like it can happen that the wrong job gets the runner and some jobs are left over still waiting for a runner.
I tried to add additional labels to the runner labels but have seen that this will be blocked by the webhook script.
cdk-github-runners/src/webhook-handler.lambda.ts
Line 148 in 0fc4be5
feature-request:
I would like to have the possiblity to add additional labels to a workflow.
[self-hosted, large-runner, additional-label-for-workflow, additional-label-for-repo, additional-label]
Would mean you only check for required labels [self-hosted, large-runner] if those match and then forward it.
The text was updated successfully, but these errors were encountered: