Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experiment] WASM IPLD Codecs and ADLs #9016

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from
Draft

Conversation

aschmahmann
Copy link
Contributor

This PR is an experiment exploring how go-ipfs could leverage IPLD Codecs and ADLs written in WebAssembly.

Some pieces that are here that could be extracted having nothing to do with WASM:

  • Plugin interface for specifying new ADLs
  • Prototype of leveraging selectors on the HTTP gateway endpoint to render any IPLD node presenting as the Bytes kind as a file

This allows loading codecs and ADLs via the config file, e.g.

  "Plugins": {
    "Plugins": {
		"ipld-wasmipld" : {
			"Config" : {
				"Codecs" : [
					{
						"Code" : "bencode",
						"Encode" : true,
						"Decode" : true,
						"WasmPath" : "/foo/bar/bencode.wasm"
					}
				],
				"ADLs" : [
					{
						"Name" : "bittorrentv1-directory",
						"WasmPath" : "/baz/bt_dirv1.wasm"
					}
				]
			}
		}
	}
  }

The work describing how to make WASM code that's compatible is being explored in https://github.com/aschmahmann/wasm-ipld. ipld/wasm-ipld#2 is the latest draft.

You can just do something like cargo build --target wasm32-unknown-unknown --release in the wasmlib folder to generate the wasm blobs for inclusion and play around with it.

If you want to see custom codecs/ADLs in action it can be a bit of a pain since you have to actually write the data somewhere and then import it to go-ipfs (e.g. ipfs block put/ipfs dag put/ipfs dag import). If you want to see a koala picture inside a BitTorrent folder load over a go-ipfs HTTP gateway then:

  1. Add each of the blocks in the fixtures folder to go-ipfs with SHA-1
  2. Add the animals.infodict block to go-ipfs with SHA-1
  3. Go to http://localhost:8080/ipfs/f01631114d55f4390b4e6f5c980ff06340beda9bddd6ff926?selector=bafyqanvbmf7keyj6ufqwnilcmy7kc2kln5qwyyjonjygpilbf2qgeyltozrgs5dun5zhezloor3dcllenfzgky3un5zhs
  4. Profit (PS: you can also do ipfs dag get on that selector to see what it looks like in dag-json)

Copy link
Contributor

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be interesting to see how this setup performs when we end up with a wasm ADL of something like UnixFS where we have multiple recursive iterations of reification as we traverse down a directory tree. I'm worried that the jumping in/out of wasm for each ADL evaluation may add up pretty quick, but i bet in practice this setup works fine for a lot of IPFS use cases.

Very excited for this future! 🚀

Comment on lines +106 to +109
if cl, ok := link.(cidlink.Link); !ok {
return nil, fmt.Errorf("cannot process link: %v", link)
} else {
block, err := api.blocks.GetBlock(linkContext.Ctx, cl.Cid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's regret around the structure we ended up here - with links in practice needing to be cidlink.Link but doing this check of it every time to pull out the Cid.

Instead, I think the pattern ipld-prime has been hoping to transition to is to use cid.Cast(link.Binary())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, I thought this pattern is what we've got at the moment. Also, wouldn't cid.Cast do parsing again?

@@ -0,0 +1,209 @@
---
title: "WAC Specification"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is 'wac'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get extracted (and maybe a better name?) and made into an spec PR to the IPLD repo, but basically I created a new IPLD codec called WAC (webassembly codec). The name isn't really great, just named after the first use case I had in mind.

The idea is basically to have a simple codec that fits the IPLD data model, which both:
a) Gives us something to discuss about what the IPLD data model is
b) Means I can get data in/out of WASM losslessly without requiring tons of calls to extract each integer, string, byte array, etc. manually. I can copy entire nested maps of nested maps of lists and it's fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registry = &wasmRegistry{}

for _, c := range cfg.Codecs {
wasm, err := ioutil.ReadFile(c.WasmPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how much more annoying would it be to reference the wasm as a cid in the block store rather than an on-disk path?

Copy link
Contributor

@Jorropo Jorropo Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that lead to some kind of weird attacks, where you could create a format shows up differently based on some unrepeatable sideeffects ?

(I guess we could configure the WASM interpreter to be deterministic and repeatable (no random, or io available, ...)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how much more annoying would it be to reference the wasm as a cid in the block store rather than an on-disk path?

Not that much more, I'd like to be able to do that. Some things I think we'd want to do to enable this include:

  1. A way to protect the WASM blocks from getting cleared by GC (should be pretty easy)
  2. At least for now assert a 2MiB limit on WASM blobs make that they must be Raw blocks
  3. Let the plugin get access to a blockservice

return nil
}

const fuelPerOp = 10_000_000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, makes sense. Just left it there for now, not sure what makes sense for limits. Could certainly allow limits to be per-module or something if people want to mess with things. Could also remove the limits by default 🤷

None of our codecs or ADLs have any sort of limits like this at the moment. It's worth considering what this means in terms of helping users make their code predictably deployable. Wouldn't want people to be surprised that some of their data loaded with implementation A, but not B due to limits here. Obviously people can do whatever they want as with block size limits, but we probably want some reasonably large safe boundaries to let people work with.

Comment on lines +461 to +462
var fnBuildSel func(comps []string) (builder.SelectorSpec, error)
fnBuildSel = func(comps []string) (builder.SelectorSpec, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var fnBuildSel func(comps []string) (builder.SelectorSpec, error)
fnBuildSel = func(comps []string) (builder.SelectorSpec, error) {
fnBuildSel := func(comps []string) (builder.SelectorSpec, error) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure the code won't compile if you do that since the function is recursive

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok you are right, it's cursed to have a recursive virtual call to a function pointer if you aren't doing DP.

I would move this out as a private function on the global scope.

@BigLep
Copy link
Contributor

BigLep commented Sep 1, 2022

While @aschmahmann is out, I'm not imagining anyone will be driving it forward. I think relevant parties are aware of the code in case it is relevant in the short term.

@aschmahmann aschmahmann mentioned this pull request Aug 28, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants