-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Robin Berjon
committed
Dec 10, 2024
1 parent
d4850ed
commit ccf4e42
Showing
6 changed files
with
205 additions
and
77 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,43 +2,52 @@ | |
<meta charset="UTF-8"> | ||
<meta name="viewport" content="width=device-width, initial-scale=1.0"> | ||
<title>Content IDs (CIDs)</title> | ||
<link rel="stylesheet" href="spec.css"><link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><rect x=%220%22 y=%220%22 width=%22100%22 height=%22100%22 fill=%22%2300ff75%22></rect></svg>"><meta name="twitter:card" content="summary_large_image"><meta name="twitter:title" property="og:title" content="DASL: Content IDs (CIDs)"><meta name="twitter:description" property="og:description" content="DASL CIDs are a strict subset of IPFS CIDs (but you don't need to understanding anything about IPFS to either use or implement them) with the following properties:"><meta name="twitter:image" property="og:image" content="https://dasl.ing/banner.png"><meta name="twitter:image:alt" content="Very colourful stripes, so colourful it hurts"><meta name="twitter:url" property="og:url" content="https://dasl.ing/"><meta property="og:site_name" content="DASL"><meta property="og:locale" content="en"><meta name="theme-color" content="#00ff75"></head> | ||
<body><div class="nav-back">A specification of the <a href="/">DASL Project</a>.</div><main><header><h1>Content IDs (CIDs)</h1><table><tbody><tr><th>date</th><td>2024-12-10</td></tr><tr><th>editors</th><td><a href="https://berjon.com/">Robin Berjon</a> <<a href="mailto:[email protected]">[email protected]</a>> & <a href="https://bumblefudge.com/">Juan Caballero</a> <<a href="mailto:[email protected]">[email protected]</a>></td></tr><tr><th>abstract</th><td><div id="abstract"> | ||
<link rel="stylesheet" href="spec.css"><link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><rect x=%220%22 y=%220%22 width=%22100%22 height=%22100%22 fill=%22%2300ff75%22></rect></svg>"><meta name="twitter:card" content="summary_large_image"><meta name="twitter:title" property="og:title" content="DASL: Content IDs (CIDs)"><meta name="twitter:description" property="og:description" content="DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash with enough metadata to be extensible (to add new hash types in the future) and to indicate whether they are pointing to raw bytes or to structured data."><meta name="twitter:image" property="og:image" content="https://dasl.ing/banner.png"><meta name="twitter:image:alt" content="Very colourful stripes, so colourful it hurts"><meta name="twitter:url" property="og:url" content="https://dasl.ing/"><meta property="og:site_name" content="DASL"><meta property="og:locale" content="en"><meta name="theme-color" content="#00ff75"></head> | ||
<body><div class="nav-back">A specification of the <a href="/">DASL Project</a>.</div><main><header><h1>Content IDs (CIDs)</h1><table><tbody><tr><th>date</th><td>2024-12-10</td></tr><tr><th>editors</th><td><a href="https://berjon.com/">Robin Berjon</a> <<a href="mailto:[email protected]">[email protected]</a>><br><a href="https://bumblefudge.com/">Juan Caballero</a> <<a href="mailto:[email protected]">[email protected]</a>></td></tr><tr><th>issues</th><td><a href="https://github.com/darobin/dasl.ing/issues">list</a>, <a href="https://github.com/darobin/dasl.ing/issues/new">new</a></td></tr><tr><th>abstract</th><td><div id="abstract"> | ||
<p> | ||
DASL CIDs are a strict subset of <a href="https://docs.ipfs.tech/concepts/content-addressing/">IPFS CIDs</a> | ||
(but you don't need to understanding anything about IPFS to either use or implement them) with the following properties: | ||
DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash | ||
with enough metadata to be extensible (to add new hash types in the future) and to indicate whether | ||
they are pointing to raw bytes or to structured data. | ||
</p> | ||
</div></td></tr></tbody></table></header> | ||
|
||
<section> | ||
<h2>Parsing CIDs</h2> | ||
<ul> | ||
<li>Only modern CIDv1 CIDs are used, not legacy CIDv0.</li> | ||
<h2>Introduction</h2> | ||
<p> | ||
DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash | ||
with enough metadata to be extensible (to add new hash types in the future) and to indicate whether | ||
they are pointing to raw bytes or to structured data. If you're simply DASL CIDs as identifiers, you | ||
can almost certainly just use the string as an opaque ID and worry no further. | ||
</p> | ||
<p> | ||
A DASL CID can be represented as a string or as an array of bytes. If you wish to understand the | ||
internals of a CID, it has the following structure: | ||
</p> | ||
<ol> | ||
<li> | ||
Only the lowercase base32 multibase encoding (the <code>b</code> prefix) is used for human-readable | ||
(and subdomain-usable) string encoding. | ||
A <code>b</code> prefix (only in string form). This is an extensibility point for future | ||
CID encodings other than the current base32 to be supported. (Currently this is the only one.) | ||
</li> | ||
<li> | ||
Only the <code>raw</code> binary multicodec (0x55) and <code>dag-cbor</code> multicodec (0x71), with the | ||
latter used only for dCBOR42-conformant DAGs. | ||
A version number, which is currently always 1. | ||
</li> | ||
<li>Only SHA-256 (0x12) and BLAKE3 hash functions (0x1e), and the latter only in certain circumstances.</li> | ||
<li> | ||
Regardless of size, resources <em>should not</em> be "chunked" into a DAG or Merkle tree (as historically done with | ||
UnixFS canonicalization in IPFS systems) but rather hashed in their entirety and content-addressed directly. | ||
A content codec, which is a flag indicating whether it is pointing to structured or raw | ||
data. | ||
</li> | ||
<li> | ||
This set of options has the added advantage that all the aforementioned single-byte prefixes require no | ||
additional varint processing or byte-fiddling. | ||
A hash type, that is always SHA-256 ([<a href="#ref-sha256" class="ref">sha256</a>]). | ||
</li> | ||
</ul> | ||
<p> | ||
Supporting two hashes isn't ideal, but having one hash type that can stream large resources (and do incremental | ||
verification mid-stream) is a plus. Because BLAKE3 is still far from being supported by web browsers, it is | ||
strongly recommended that CID producers limit themselves to SHA-256 if possible. Implementations intending to | ||
run in web contexts are likely to either forego BLAKE3 verification in-browser, outsource verification to a | ||
trusted component, or to have to dynamically load a BLAKE3 library in the browser, which may cause latency. | ||
</p> | ||
<li> | ||
A hash size, indicating how many bytes long the digest is. | ||
</li> | ||
<li> | ||
A digest, which is the hash of the content being identified. | ||
</li> | ||
</ol> | ||
</section> | ||
<section> | ||
<h2>Parsing CIDs</h2> | ||
<p> | ||
Use the following steps to <dfn id="dfn-parse-a-cid-string">parse a CID string</dfn>: | ||
</p> | ||
|
@@ -47,8 +56,9 @@ <h2>Parsing CIDs</h2> | |
<li>Remove the first character from <var>CID</var> and store it in <var>prefix</var>.</li> | ||
<li>If <var>prefix</var> is not equal to <code>b</code>, throw an error.</li> | ||
<li> | ||
Decode the rest of <var>CID</var> using <a href="https://datatracker.ietf.org/doc/html/rfc4648#section-6">the | ||
base32 algorithm from RFC4648</a> with a lowercase alphabet and store the result in <var>CID bytes</var> ([[<a href="#ref-rfc4648" class="ref">rfc4648</a>]]). | ||
Decode the rest of <var>CID</var> using | ||
<a href="https://datatracker.ietf.org/doc/html/rfc4648#section-6">the base32 algorithm from | ||
RFC4648</a> with a lowercase alphabet and store the result in <var>CID bytes</var> ([<a href="#ref-rfc4648" class="ref">rfc4648</a>]). | ||
</li> | ||
<li>Return the result of applying the steps to <a href="#dfn-decode-a-cid" class="dfn-ref">decode a CID</a> to <var>CID bytes</var>.</li> | ||
</ol> | ||
|
@@ -60,7 +70,10 @@ <h2>Parsing CIDs</h2> | |
<li> | ||
Remove the first byte in <var>binary CID</var> and store it in <var>prefix</var>. | ||
</li> | ||
<li>If <var>prefix</var> is not equal to <code>0</code> (a null byte, the binary base256 prefix), throw an error.</li> | ||
<li> | ||
If <var>prefix</var> is not equal to <code>0</code> (a null byte, the binary base256 | ||
prefix), throw an error. | ||
</li> | ||
<li>Store the rest of <var>binary CID</var> in <var>CID bytes</var>.</li> | ||
<li>Return the result of applying the steps to <a href="#dfn-decode-a-cid" class="dfn-ref">decode a CID</a> to <var>CID bytes</var>.</li> | ||
</ol> | ||
|
@@ -76,17 +89,65 @@ <h2>Parsing CIDs</h2> | |
<li> | ||
Remove the next byte in <var>CID bytes</var> and store it in <var>codec</var>. | ||
</li> | ||
<li>If <var>codec</var> is not equal to <code>0x55</code> (raw) or <code>0x71</code> (dCBOR42), throw an error ([[<a href="#ref-dcbor42" class="ref">dcbor42</a>]]).</li> | ||
<li> | ||
Remove the next two bytes in <var>CID bytes</var> and store them in <var>hash type</var> and <var>hash size</var>, | ||
respectively. | ||
If <var>codec</var> is not equal to <code>0x55</code> (raw) or <code>0x71</code> (dCBOR42), | ||
throw an error ([<a href="#ref-dcbor42" class="ref">dcbor42</a>]). | ||
</li> | ||
<li> | ||
Remove the next two bytes in <var>CID bytes</var> and store them in <var>hash type</var> and | ||
<var>hash size</var>, respectively. | ||
</li> | ||
<li> | ||
If <var>hash type</var> is not equal to <code>0x12</code> (SHA-256) or <code>0x1e</code> (BLAKE3), | ||
throw an error ([<a href="#ref-sha256" class="ref">sha256</a>], [<a href="#ref-blake3" class="ref">blake3</a>]). | ||
</li> | ||
<li> | ||
If there are fewer than <var>hash size</var> bytes left in <var>CID bytes</var>, throw an error. | ||
</li> | ||
<li> | ||
Remove the first <var>hash size</var> bytes from <var>CID bytes</var> and store them in | ||
<code>digest</code>. Store the rest in <var>remaining bytes</var>. | ||
</li> | ||
<li> | ||
Return <var>version</var>, <var>codec</var>, <var>hash type</var>, <var>hash size</var>, | ||
<var>digest</var>, and <var>remaining bytes</var>. | ||
</li> | ||
<li>If <var>hash type</var> is not equal to <code>0x12</code> (SHA-256) or <code>0x1e</code> (BLAKE3), throw an error. ([[<a href="#ref-sha256" class="ref">sha256</a>]], [[<a href="#ref-blake3" class="ref">blake3</a>]])</li> | ||
<li>If there are fewer than <var>hash size</var> bytes left in <var>CID bytes</var>, throw an error.</li> | ||
<li>Remove the first <var>hash size</var> bytes from <var>CID bytes</var> and store them in <code>digest</code>. Store the rest in <var>remaining bytes</var>.</li> | ||
<li>Return <var>version</var>, <var>codec</var>, <var>hash type</var>, <var>hash size</var>, <var>digest</var>, and <var>remaining bytes</var>.</li> | ||
</ol> | ||
</section> | ||
<section> | ||
<h2>Relationship to IPFS</h2> | ||
<p> | ||
You don't need to understand IPFS in order to use DASL. This section is for informational | ||
purposes only. | ||
</p> | ||
<p> | ||
DASL CIDs are a strict subset of <a href="https://docs.ipfs.tech/concepts/content-addressing/">IPFS CIDs</a> | ||
with the following properties: | ||
</p> | ||
<ul> | ||
<li>Only modern CIDv1 CIDs are used, not legacy CIDv0.</li> | ||
<li> | ||
Only the lowercase base32 multibase encoding (the <code>b</code> prefix) is used for human-readable | ||
(and subdomain-usable) string encoding. | ||
</li> | ||
<li> | ||
Only the <code>raw</code> binary multicodec (0x55) and <code>dag-cbor</code> multicodec (0x71), with the | ||
latter used only for dCBOR42-conformant DAGs. | ||
</li> | ||
<li>Only SHA-256 (0x12) for the hash function .</li> | ||
<li> | ||
The CID isn't the boss of anyone, but the expectation is that, regardless of size, resources | ||
<em>should not</em> be "chunked" into a DAG or Merkle tree (as historically done with UnixFS canonicalization | ||
in IPFS systems) but rather hashed in their entirety and content-addressed directly. That being | ||
said, a DASL CID can point to a piece of dCBOR42 metadata that describes this kind of | ||
chunking, if needed. (A separate specification may be added for that.) | ||
</li> | ||
<li> | ||
This set of options has the added advantage that all the aforementioned single-byte prefixes require no | ||
additional varint processing or byte-fiddling. | ||
</li> | ||
</ul> | ||
</section> | ||
|
||
|
||
<section><h2>References</h2><dl><dt id="ref-blake3">[blake3]</dt><dd>J-P. Aumasson, S. Neves, J. O'Connor, Z. Wilcox. <a href="https://www.ietf.org/archive/id/draft-aumasson-blake3-00.html"><cite>The BLAKE3 Hashing Framework</cite></a>. July 2024. URL: <a href="https://www.ietf.org/archive/id/draft-aumasson-blake3-00.html">https://www.ietf.org/archive/id/draft-aumasson-blake3-00.html</a></dd><dt id="ref-dcbor42">[dcbor42]</dt><dd>Robin Berjon & Juan Caballero. <a href="https://dasl.ing/dcbor42.html"><cite>Deterministic CBOR with tag 42 (dCBOR42)</cite></a>. 2024-12-10. URL: <a href="https://dasl.ing/dcbor42.html">https://dasl.ing/dcbor42.html</a></dd><dt id="ref-rfc4648">[rfc4648]</dt><dd>S. Josefsson. <a href="https://www.rfc-editor.org/rfc/rfc4648"><cite>The Base16, Base32, and Base64 Data Encodings</cite></a>. October 2006. URL: <a href="https://www.rfc-editor.org/rfc/rfc4648">https://www.rfc-editor.org/rfc/rfc4648</a></dd><dt id="ref-sha256">[sha256]</dt><dd>National Institute of Standards and Technology, <cite>Secure Hash Algorithm. NIST FIPS 180-2</cite>. August 2002.</dd></dl></section></main></body></html> |
Oops, something went wrong.