-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local Monorepo Build Orchestration #1121
Comments
From the meeting today:
|
From the meeting today:
|
Here's a WIP proposal for the yaml structure to keep the conversation going api: v0/workspace
# a set of package recipes can be used to build new versions
# of packages as needed by the orchestration process
recipes:
# either globs or individual paths can be used to collect recipes
- packages/*/*.spk.yml
# these can either be platform recipes or package recipes that
# might be referenced by builds in the workspace
- platforms/*/*.spk.yml api: v1/platform
platform: dcc-platform
base:
# the ilm workflow requires inheriting from multiple bases
# so we update to support a list in this position, order matters
- company-platform
- exr-platform
# the requirements of a platform are used to generate
# 'ifAlreadyPresent' requests under build- and run-time
# another name for this concept might be 'constraints' or
# 'opinions' if we are concerned about the deviation of
# the structure from package requirements
requirements:
# the entry structure takes inspiration from the designs from
# https://github.com/spkenv/spk/pull/296
- pkg: gcc/9.3.1
# when building all packages from a platform, the value from the
# primary pkg line is used to template package specs and so should
# be a complete version number if that functionality is desired
- pkg: gcc/9.3.1 # tries to template & build gcc/9.3.1
# separate requests for use in the build and run components of this
# platform are created from the same primary version but can be
# overridden by any version range specifier as appropriate. To make the
# above build process work, I think that would would need to avoid
# accepting range specifiers in the main pkg line.
- pkg: gcc/9.3.1
atBuild: >=9.3.1
atRuntime: ~9.3
# using false in either scenario stops the request from
# being added to that component, and would also effectively
# remove the request if it had come from the base
- pkg: imath
atBuild: false |
From the meeting today:
|
Okay, I'm rethinking the approach here a little bit - based on our discussion and just imagining the workflows: api: v0/workspace
recipes:
# collect all of the recipes in the workspace
- packages/**.spk.yaml
# some recipes require additional information
# which can be augmented even if they were already
# collected above
- path: packages/python/python2.spk.yaml
# here, we define the specific versions that can
# be build from a recipe
versions: [2.7.18]
- path: packages/python/python3.spk.yaml
# we can use bash-style brace expansion to define
# ranges of versions that are supported
versions:
- '3.7.{0..17}'
- '3.8.{0..20}'
- '3.9.{0..21}'
- '3.10.{0..16}'
- '3.11.{0..11}'
- '3.12.{0..8}'
- '3.13.{0..1}' My goal is to get something going that will be able to build a couple of consistent sets of packages from the current repo as a start. In this setup, the platforms don't factor into this any more than simply being used as dependencies like they are already, but I'm not confident that this is the final form of this feature yet, either. |
Here's a few thoughts/questions that might influence the design or point out missing features.
*If we were to attempt to build packages across multiple nodes, where each machine has its own local spfs repository but shares a central origin repository, do we expose enough metadata, package details and information in order to make it possible? I don't know the answer to this one. In the abstract, it seems like an orchestration process would need to be able to query the following details:
An orchestration process could basically work by querying for the packages that have not yet been built, determining which ones have all of their dependencies already satisfied, and then running spk build commands for each of these packages on remote hosts. This would allow us to scale up to building packages in parallel across multiple nodes. The builder node may not want to publish the package to If we are shuffling The orchestration process would perform these steps in a loop until no packages remain to be built or until a build error occurs. At this point the build is either Complete or Failed. This design implies that the query for the set of packages to build should be an incremental query that can be performed against the origin + local repository. It doesn't seem like anything in the v0/workspace spec prevents this design, but are there any missing features or tooling that would be needed in order to make a fully distributed, multi-node build possible? |
Let me lay out my initial thoughts and we can see if there's a gap between what you are hoping for. I'm picturing that this is going to require a new command, something like Also, your initial answer is correct, the package specs would remain buildable individually just as they would have been before these changes. |
Thanks for the details. That helped fill the gaps for me. Here's how I'm understanding this so far. If the namespace feature allows for the use of multiple namespaces for reading/resolving then builds can be setup to resolve packages from both the "default" namespace and a build-specific namespace. As new packages are built they'll get published into the separate build-specific namespace of the Another variation that namespaces unlocks is when you want to rebuild everything completely from scratch without any chance of using packages from the "default" namespace. One way to do that would be to setup the builds so that only the build-specific namespace is used, thus preventing any packages from the "default" namespace from being used. That approach might not be very useful in practice without also being able to activate the "default" namespace for just its source packages, though. Rebuilding everything from source isn't a very common scenario but it might highlight the utility of being able to enable just the source packages from a particular namespace for resolving separately from its non-source packages. That seems like a more complex concept (for a less common use case) that can be layered over the core concept of namespaces later, though, so it's probably too soon to consider compared to the core idea of |
From the meeting today:
api: v1/platform
platform: dcc-platform
base:
# the ilm workflow requires inheriting from multiple bases
# so we update to support a list in this position, order matters
- company-platform
- exr-platform
# the requirements of a platform are used to generate
# 'ifAlreadyPresent' requests under build- and run-time
# another name for this concept might be 'constraints' or
# 'opinions' if we are concerned about the deviation of
# the structure from package requirements
requirements:
# the entry structure takes inspiration from the designs from
# https://github.com/spkenv/spk/pull/296
- pkg: gcc/9.3.1
# when building all packages from a platform, the value from the
# primary pkg line is used to template package specs and so should
# be a complete version number if that functionality is desired
- pkg: gcc/9.3.1 # tries to template & build gcc/9.3.1
# separate requests for use in the build and run components of this
# platform are created from the same primary version but can be
# overridden by any version range specifier as appropriate. To make the
# above build process work, I think that would would need to avoid
# accepting range specifiers in the main pkg line.
- pkg: gcc/9.3.1
atBuild: >=9.3.1
atRuntime: ~9.3
# using false in either scenario stops the request from
# being added to that component, and would also effectively
# remove the request if it had come from the base
- pkg: imath
atBuild: false |
Is your feature request related to a problem? Please describe.
At ILM, we have a distinct workflow where many packages are maintained from a single repo (not currently spk). Our developer and CI workflows operate on "collections" that construct a simple dependency DAG and then build packages that don't exist in order. Notably, the package recipes work for all possible versions of the package and so the version is provided by the collection, not the package spec.
Describe the solution you'd like
We would like to convert these packages into spk packages and then have a way to orchestrate builds in the same way using spk.
Describe alternatives you've considered
We could hook spk into our existing system, eg build the DAG ourselves and then invoke spk, but resolving the dependencies properly in such a DAG is difficult due to spk spec files being difficult to introspect from the outside, especially as a set of var options are included in the picture.
The text was updated successfully, but these errors were encountered: