-
-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better wheel hosting solution #3049
Comments
Yeah, multiple versions are pretty sticky: in jupyterlite, we're currently working through the growing pains of a major version of a "special" upstream ( It's not that bad, though, as we've been maintaining a "collapsed" warehouse-like file structure which doesn't have some of the features of the conda-like To that end: the eventual solution should:
The closest I've come to this, is, no surprise here, the conda, conda-build, and ultimately, conda-forge ecosystem. This might work, today on anaconda.org: it will actually host wheels, and provide a pip-compatible (but probably not a warehouse-compatible) API: https://docs.anaconda.com/anacondaorg/user-guide/tasks/work-with-packages/#using-package-managers It offers free (as in beer), (basically) unlimited package hosting for open source, as well as private packages (though it gets a bit hairy). There is an on-going effort to provide an open source implementation of that site's capabilities at https://github.com/mamba-org/quetz, but to my knowledge, it doesn't offer wheels. Of course... the real step there would then be to eschew the wheel ecosystem entirely, and use the conda(-forge) ecosystem more directly. The double-edged sword of the "real" |
@bollwyvl I think we want to stick to wheels. Most Python packages can make wheels automatically with |
I salute your optimism!
No, they won't do that, either, but a few thousand packages in, conda-forge has demonstrated that a community-based, distributed model with heavy automation on donated CI, and donated cold storage can achieve substantial, reproducible things at scale, and keep them up-to-date with best practices. |
Thanks for your summary about packaging on the jupyter side, it's helpful! Emscripten-forge is going in the conda direction, with a planned contribution to conda-forge and I think it's great that people are exploring this idea. For our side, we decided to go in the direction of wheels. Things have improved a lot there as well with cibuildwheels and related projects. A lot of packages are on PyPi only, and people still need the ability to install packages from there (or some private repositories that never going to be on conda-forge, particularly pure Python ones). About hosting I think we would rather rely on some community solution, we have a good agreement with JsDelivr CDN with no limits on the bandwidth so far. |
As a data point - BeeWare uses an anaconda.org repository to host iOS binary wheels, and it's been working really well. Having a similar collection available for Pyodide/emscripten wheels would be exceedingly helpful for some BeeWare's packaging workflows. |
Yes, I'm aware of that possibility, and I hear good things about it. For Pyodide we currently need more than wheels (there are also various .js files) for which we probably need JsDelivr in any case. Given the weight of Anaconda in this space, I'm also a bit reluctant to host our packages there to avoid too much centralization (and Pyscript might do something in that direction eventually I guess). As a for using it for hosting user-built wheel, certainly, we should probably mention this possibility. So in the end I'm not sure what would be the outcome of this issue. There seem to be a growing sentiment that we don't want to manage or review a service where people can arbitrarily upload packages. We still need JsDelivr in any case for JS files. And PyPI support seem within reach on the medium term. Better support for third party hosting services is something we also need to improve in pyodide/micropip#62. So maybe the outcome is just to recommend wheel hosting services people can use to host wheels (including anaconda.org) |
Yes, that was what I was thinking in pyodide/micropip#62. What we do is to support standard PyPI APIs (Simple API) + allow people to use alternative registries. Then people can choose any hosting solutions they want until PyPI supports hosting Emscripten wheels. |
@rth Totally understood that you might not want to visibly use an Anaconda service for this; I mention it only because (a) BeeWare is using it for other purposes, and (b) if the constraint was having access to a free simple index hosting option, the option is there. In terms of my own personal wishlist - I acknowledge that there's a need for other non-wheel files to be hosted, but having the existing, officially published binary wheels available as a simple index would (AFAICT) do everything I'm currently looking for. Maintaining an unofficial mirror of the published Pyodide wheels is one of the options on the table, but I'd vastly prefer to avoid if I can. Longer term, having the ability for other users to upload wheels would also be great (but also moves into the territory of "build your own PyPI"); but in the short term, having pip-compatible access to the wheels that already exist would be more than sufficient. |
Since we're already generating a simple index I agree that we should deploy it somewhere so you can use it. |
From my discussions with pypa members at PyCon, I am optimistic that we will most likely be able to upload Pyodide wheels to pypi a year from now. |
The simple index is really simple BTW, nothing really prevents us from exposing a simple index for the files we built already now. Unfortunately JsDelivr will not allow us to distribute .html files for security reasons. But we can probably allow access to those via a different subdomain if it's useful. The other alternative is put some very simple service that takes a |
We already are generating a simple index we just don't serve it. |
Can you paste the URL that is missing the headers here and I'll ask someone to look at it? |
For instance if one does,
there are no CORS headers in response. As opposed to PyPI,
which yields,
I think just the last line would be sufficient to set in the return headers. For anaconda.org, since it returns a 302 redirect to |
I think pip needs range because of lazy wheel. Not a bad thing to add anyways. |
Though looking at the MSDN docs it's not clear if CORS in combination with a redirect to a different domain is even allowed. |
@hoodmane did you hear back on this by any chance? In scikit-learn we are now building a Pyodide wheel in the CI scikit-learn/scikit-learn#26374. It would be very nice to upload it to https://anaconda.org/scipy-wheels-nightly as we (and other scientific Python packages) do for other development wheels and have it installable in a notebook similar to something like this:
|
A gentle ping @hoodmane on the above :) Or just please give us the contact of the person whom we could ask this. It would really be nice to upload dev wheels for scientific packages there. |
I made the request and I think they added them. |
Nevermind, it has not been done I tested wrong. |
Hey there! Just checking in re: whether the simple index found hosting somewhere. Thanks! |
Hi @fpliger, would it be possible to have an idea whether having CORS headers for anaconda.org is still kind of moving forward? This is a follow-up of pyodide/micropip#101 (comment) but using this issue to try to keep the conversation in a centralized place. Of course, I completely understand there might be technical complexities + political challenges and that it is probably not at the top of Anaconda priorities. It still would be great to have a sense whether this is something that may happen at one point, although I completely understand that it is hard to give a precise timeline. In my main use case for scikit-learn (that I tried to sum up in pyodide/micropip#101 (comment)) if I get some kind of signal that this is going to happen in something like say 6 months, I may actually look for other work-arounds in the mean-time. The first that comes to mind is to have Pyodide wheels in a github repo and use jsdelivr CDN to be able to If the discussion is somewhat easier to have outside of a public issue tracker, I'd love to be part of the it, and I guess others like @rth may be as well. |
I think we're getting quite close to being able to upload wheels to pypi. I'd appreciate it if the pypa people can share their opinions on what else needs to be done first @henryiii @di. @henryiii suggested that in his opinion the most important step was inclusion in cibuildwheel which we finally merged just the other day: @agriyakhetarpal can you tell us which scientific computing packages already test against us in CI and would be ready to upload a wheel to pypi, and which ones are on your list to add? |
Thanks for the ping, @hoodmane! Here's a list that I keep maintained, by no means exhaustive – I have yet to do some of these, and some of them (like Note This table is also mirrored at Quansight-Labs/czi-scientific-python-mgmt#18 Based on https://anaconda.org/scientific-python-nightly-wheels/, I haven't looked into Xarray or Uproot so far, but happy to do so if needed. Also cc: @rgommers. I have tested NumPy's WASM build with |
(side-comment: for scikit-learn there is no anaconda.org scheduled uploads because I was kind of waiting for the anaconda.org CORS headers situation to clarify. Adding scheduled uploads is probably doable in finite time though) |
Ah, I looked at pyodide/micropip#80 and I thought they were being uploaded – updated the entry in the table to "Planned". I'm happy to write a PR for that if you'd like, or step back if you wish to do this yourself! |
iminuit also reports Are these out-of-tree builds with pyodode 0.26? So far I'm 0 for 2 on 0.26 builds. Both packages are scikit-build-core based and use pybind11 and have exceptions enabled. At least boost-histogram was fine with <0.26 out-of-tree. I can't find the awkward build mentioned, the Awkward out of tree build is still 3.11 in the docs. Maybe #2964 is related? |
Yes – they are for NumPy, PyWavelets, scikit-image, Zarr, numcodecs, and pandas (available in the PRs for the latter four, not yet merged). I see that awkward is yet to be upgraded to 0.26.0 and it doesn't have WASM nightlies (updated the table), but I notice that |
Great to hear that! Note that even if PyPI Pyodide wheels is possible, |
Now that we have switched to wheels for packages, our current distribution model reaches its limits as it doesn't allow,
This issue aims to discuss what an improved hosting solution could look like and evaluate several possible directions. Of course, the question remains whether the added benefits would be worth the extra maintenance effort.
From a usage perspective, I think what we want is a PyPI like mirror, where users (us included) can upload packages with
twine
using an API key. Internally packages would still be uploaded to the same S3 bucket and distributed via JsDelivr. So the system distributing packages (i.e. S3 + JsDelivr) is separate from the website of the package index. This way we can guarantee very good package availability, even if there is some maintenance downtime time on the package index website.Aside from the above-listed functionality, I think we would need,
Bonus features,
meta.yaml
, or automatic package analysis about potentially unsupported functionality)If we want to use an existing OSS solution, there are the following possibilities,
piwheels
I'll later add below my evaluation of these solutions.
At the same time, we should keep in mind that in the long term wasm/emscripten wheels might be supported by PyPI, so maybe it's not worth spending too much effort on this. However, even if PyPI does support them, it might not include web-specific optimization that we are able to do.
The text was updated successfully, but these errors were encountered: