forked from andreubotella/multipart-form-data
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.bs
319 lines (273 loc) · 17 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
<pre class='metadata'>
Group: WHATWG
H1: <code>multipart/form-data</code>
Title: multipart/form-data
Shortname: formdata
Status: DREAM
Text Macro: TWITTER (unused)
Text Macro: LOGO https://resources.whatwg.org/logo.svg
Metadata Include: Participate off, Commits off, Tests off
!Repository: <a href=https://github.com/andreubotella/multipart-form-data>GitHub</a>
Abstract: A web-spec definition of the <dfn export><code>multipart/form-data</code></dfn> format and
related algorithms, meant for inclusion in the WHATWG standards.
</pre>
<pre class="anchors">
url:https://httpwg.org/specs/rfc7230.html#header.fields;text:field-name;type:dfn;spec:http
</pre>
<code>multipart/form-data</code> serializing {#serializing}
===========================================================
A <dfn export for="multipart/form-data" lt="boundary"><code>multipart/form-data</code>
boundary</dfn> is a [=byte sequence=] such that:
* its length is greater or equal to 27 and lesser or equal to 70, and
* it is composed by bytes in the ranges 0x30 to 0x39, 0x41 to 0x5A, or 0x61 to 0x7A, inclusive
([=ASCII alphanumeric=]), or which are 0x27 ('), 0x2D (-) or 0x5F (_).
To <dfn export for="multipart/form-data/boundary">generate</dfn> a <a
for="multipart/form-data" lt="boundary"><code>multipart/form-data</code> boundary</a>, return an
[=implementation-defined=] byte sequence which fullfills the conditions for boundaries, such that
part of it is randomly generated, with a minimum entropy of 95 bits.
<p class="note">Previous definitions of <a><code>multipart/form-data</code></a> <span
class="allow-2119">required</span> that the [=multipart/form-data/boundary=] associated with a
<code>multipart/form-data</code> payload not be present anywhere in the payload other than as a
delimiter, although they allow for generating the [=multipart/form-data/boundary=]
probabilistically. Since this generation algorithm is separate from a payload, however, it has to
specify a minimum entropy instead. [[RFC7578]] [[RFC2046]]
<p class="note">If a user agent generates <a><code>multipart/form-data</code></a> boundaries with a
length of 27 and an entropy of 95 bits, given a payload made specifically to generate collisions
with that user agent's boundaries, the expected length of the payload before a collision is found is
well over a yottabyte.
<hr>
<div algorithm="escape a multipart/form-data name">
To <dfn>escape a <code>multipart/form-data</code> name</dfn> with a string |name|, an optional
[=/encoding=] |encoding| (default [=UTF-8=]) and an optional boolean <dfn for="escape
name">|isFilename|</dfn> (default false):
1. If |isFilename| is true:
1. Set |name| to the result of [=string/converting=] |name| into a [=scalar value string=].
1. Otherwise:
1. [=Assert=]: |name| is a [=scalar value string=].
1. Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every occurrence of
U+000A (LF) not preceded by U+000D (CR), in |name|, by a string consisting of U+000D (CR)
and U+000A (LF).
1. Let |encoded| be the result of [=/encode|encoding=] |name| with |encoding|.
1. Replace every 0x0A (LF) bytes in |encoded| with the byte sequence `<code>%0A</code>`, 0x0D (CR)
with `<code>%0D</code>` and 0x22 (") with `<code>%22</code>`.
1. Return |encoded|.
</div>
<div algorithm="multipart/form-data chunk serializer">
The <dfn export><code>multipart/form-data</code> chunk serializer</dfn> takes an [=/entry list=]
|entries| and an optional [=/encoding=] |encoding| (default [=UTF-8=]), and returns a tuple of a
<a for="multipart/form-data" lt="boundary"><code>multipart/form-data</code> boundary</a> and a list
of chunks, each of which can be either a byte sequence or a {{File}}:
1. Set |encoding| to the result of [=getting an output encoding=] from |encoding|.
1. Let |boundary| be the result of [=multipart/form-data/boundary/generating=] a
<a for="multipart/form-data" lt="boundary"><code>multipart/form-data</code> boundary</a>.
1. Let |output chunks| be an empty list.
1. [=list/For each=] |entry| in |entries|:
1. Let |chunk| be a byte sequence containing `<code>--</code>`, followed by |boundary|,
followed by 0x0D 0x0A (CR LF).
1. Append `<code>Content-Disposition: form-data; name="</code>`, followed by the result of
<a>escaping a <code>multipart/form-data</code> name</a> given |entry|'s
[=entry list/entry/name=] and |encoding|, followed by 0x22 ("), to |chunk|.
1. Let |value| be |entry|'s [=entry list/entry/value=].
1. If |value| is a string:
1. Append 0x0D 0x0A 0x0D 0x0A (CR LF CR LF) to |chunk|.
1. Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every
occurrence of U+000A (LF) not preceded by U+000D (CR), in |value|, by a string
consisting of U+000D (CR) and U+000A (LF).
1. Append the result of [=/encode|encoding=] |value| with |encoding| to |chunk|.
1. Append 0x0D 0x0A (CR LF) to |chunk|.
1. Append |chunk| to |output chunks|.
1. Otherwise:
1. [=Assert=]: |value| is a {{File}}.
1. Append `<code>; filename="</code>`, followed by the result of <a>escaping a
<code>multipart/form-data</code> name</a> given |value|'s {{File/name}} with |encoding|
and <a for="escape name"><var ignore>isFilename</var></a> set to true, followed by
0x22 0x0D 0x0A (" CR LF), to |chunk|.
1. Let |type| be |value|'s {{Blob/type}}, if it is not the empty string, or
"<code>application/octet-stream</code>" otherwise.
1. Append `<code>Content-Type: </code>`, followed by the result of [=isomorphic encoding=]
|type|, to |chunk|.
1. Append 0x0D 0x0A 0x0D 0x0A (CR LF CR LF) to |chunk|.
1. Append |chunk|, followed by |value|, followed by the byte sequence 0x0D 0x0A (CR LF), to
|output chunks|.
1. Append the byte sequence containing `<code>--</code>`, followed by |boundary|, followed by
`<code>--</code>`, followed by 0x0D 0x0A (CR LF), to |output chunks|.
1. Return the tuple |boundary| / |output chunks|.
</div>
<p class="note">This algorithm now matches the behavior of all major browsers.</p>
<hr>
<div algorithm="length of a multipart/form-data payload">
The <dfn export for="multipart/form-data">length</dfn> of a <a><code>multipart/form-data</code></a>
payload, given a list of chunks |chunks| which can be either byte sequences or {{File}}s, is the
result of running the following steps:
1. Let |length| be 0.
1. [=list/For each=] |chunk| in |chunks|:
1. If |chunk| is a byte sequence:
1. Increase |length| by |chunk|'s length.
1. Otherwise:
1. [=Assert=]: |chunk| is a {{File}}.
1. Increase |length| by |chunk|'s {{Blob/size}}.
1. Return |length|.
</div>
<div algorithm="create a multipart/form-data readable stream">
To <dfn export>create a <code>multipart/form-data</code> readable stream</dfn> from a list of chunks
|chunks| which can be either byte sequences or {{File}}s, run the following steps:
1. Let |file stream| be null.
1. Let |stream| be a [=new=] {{ReadableStream}}.
1. Let |pull algorithm| be an algorithm that runs the following steps:
<dl class="switch">
: if |file stream| is null and |chunks| is not empty
:: 1. If |chunks|[0] is a byte sequence, [=ReadableStream/enqueue=] a {{Uint8Array}} object
wrapping an {{ArrayBuffer}} containing |chunks|[0] into |stream|.
1. Otherwise:
1. [=Assert=]: |chunks|[0] is a {{File}} object.
1. Set |file stream| to the result of running |chunks|[0]'s {{Blob/stream}} method.
1. Run |pull algorithm|.
1. [=list/Remove=] the first item from |chunks|.
: if |file stream| is null and |chunks| is empty
:: 1. [=ReadableStream/Close=] |stream|.
: if |file stream| is not null
:: 1. Let |read request| be a new [=read request=] with the following [=struct/items=]:
: [=read request/chunk steps=], given |chunk|
:: 1. If |chunk| is not a {{Uint8Array}} object, [=ReadableStream/error=] |stream|
with a {{TypeError}} and abort these steps.
1. [=ReadableStream/Enqueue=] |chunk| into |stream|.
: [=read request/close steps=]
:: 1. Set |file stream| to null.
1. Run |pull algorithm|.
: [=read request/error steps=], given |e|
:: 1. [=ReadableStream/Error=] |stream| with |e|.
1. Let |reader| be the result of [=ReadableStream/getting a reader=] for |file stream|.
1. [=ReadableStreamDefaultReader/Read a chunk=] from |reader| with |read request|.
</dl>
1. Let |cancel algorithm| be an algorithm that runs the following steps, given |reason|:
1. If |file stream| is not null, [=ReadableStream/cancel=] |file stream| with |reason|.
1. [=ReadableStream/Set up=] |stream| with <a
for="ReadableStream/set up"><var ignore>pullAlgorithm</var></a> set to |pull algorithm| and <a
for="ReadableStream/set up"><var ignore>cancelAlgorithm</a> set to |cancel algorithm|.
1. Return |stream|.
</div>
<code>multipart/form-data</code> parsing {#parsing}
===================================================
<div class="XXX">
<p>These algorithms are a first attempt at defining a <a><code>multipart/form-data</code></a>
parser for use in {{Body}}'s {{Body/formData()}} method. The current algorithms don't yet match
any browser because their behavior disagrees at various points.
<p>Note that Gecko and Chromium also implement a Web Extensions API that parses
<code>multipart/form-data</code> independently from the parser in {{Body}} (see <a
href="https://bugzilla.mozilla.org/show_bug.cgi?id=1697292">Gecko bug 1697292</a>):
<xmp class="lang-js" style="color: #666666">
chrome.webRequest.onBeforeRequest.addListener(
(details) => {
// Returns an object mapping names to an array of values represented by
// either the string value or by the file's filename.
console.log(details.requestBody.formData);
},
{urls: ["<all_urls>"]},
["requestBody"]
);
</xmp>
</div>
<div algorithm="multipart/form-data parser">
The <dfn export><code>multipart/form-data</code> parser</dfn> takes a byte sequence |input| and a
[=MIME type=] |mimeType|, and returns either an [=/entry list=] or failure:
1. [=Assert=]: |mimeType|'s [=essence=] is "<code>multipart/form-data</code>".
1. If |mimeType|'s [=MIME type/parameters=]["<code>boundary</code>"] does not [=map/exist=], return
failure. Otherwise, let |boundary| be the result of [=UTF-8 decoding=] |mimeType|'s [=MIME
type/parameters=]["<code>boundary</code>"].
<p class="XXX">The definition of [=MIME type=] in [[MIMESNIFF]] has the [=MIME
type/parameter=] values being [=ASCII strings=], but the [=parse a MIME type=] algorithm can
create [=MIME type records=] containing non-ASCII parameter values. See <a
href="https://github.com/whatwg/mimesniff/issues/141">whatwg/mimesniff issue #141</a>. Gecko and
WebKit accept non-ASCII boundary strings and then expect them [=UTF-8 encoded=] in the request
body; Chromium rejects them instead.
1. Let |entry list| be an empty [=/entry list=].
1. Let |position| be a pointer to a byte in |input|, initially pointing at the first byte.
1. While true:
1. If |position| points to a sequence of bytes starting with 0x2D 0x2D (`<code>--</code>`)
followed by |boundary|, advance |position| by 2 + the length of |boundary|. Otherwise,
return failure.
1. If |position| points to the sequence of bytes 0x2D 0x2D 0x0D 0x0A (`<code>--</code>`
followed by CR LF) followed by the end of |input|, return |entry list|.
1. If |position| does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return
failure.
1. Advance |position| by 2. (This skips past the newline.)
1. Let |name|, |filename| and |contentType| be the result of <a>parsing
<code>multipart/form-data</code> headers</a> on |input| and |position|, if the result is not
failure. Otherwise, return failure.
1. Advance |position| by 2. (This skips past the empty line that marks the end of the headers.)
1. Let |body| be the empty byte sequence.
1. <i>Body loop</i>: While |position| is not past the end of |input|:
1. Append the code point at |position| to |body|.
1. If |body| ends with |boundary|:
1. Remove the last 4 + (length of |boundary|) bytes from |body|.
1. Decrease |position| by 4 + (length of |boundary|).
1. Break out of <i>body loop</i>.
1. If |position| does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return
failure. Otherwise, advance |position| by 2.
1. If |filename| is not null:
1. If |contentType| is null, set |contentType| to "<code>text/plain</code>".
1. If |contentType| is not an [=ASCII string=], set |contentType| to the empty string.
1. Let |value| be a new {{File}} object with name |filename|, type |contentType|, and body
|body|.
1. Otherwise:
1. Let |value| be the [=UTF-8 decode without BOM|UTF-8 decoding without BOM=] of |body|.
1. Assert: |name| is a [=scalar value string=] and |value| is either a [=scalar value string=]
or a {{File}} object.
1. [=entry list/Create an entry=] with |name| and |value|, and [=list/append=] it to
|entry list|.
</div>
<div algorithm="parse multipart/form-data headers">
To <dfn>parse <code>multipart/form-data</code> headers</dfn>, given a byte sequence <var
ignore>input</var> and a pointer into it |position|, run the following steps:
1. Let |name|, |filename| and |contentType| be null.
1. While true:
1. If |position| points to a sequence of bytes starting with 0x0D 0x0A (CR LF):
1. If |name| is null, return failure.
1. Return |name|, |filename| and |contentType|.
1. Let |header name| be the result of collecting a sequence of bytes that are not 0x0A (LF),
0x0D (CR) or 0x3A (:), given |position|.
1. Remove any [=HTTP tab or space bytes=] from the start or end of |header name|.
1. If |header name| does not match the <a spec=http>field-name</a> token production, return
failure.
1. If the byte at |position| is not 0x3A (:), return failure.
1. Advance |position| by 1.
1. Collect a sequence of bytes that are [=HTTP tab or space bytes=] given |position|. (Do
nothing with those bytes.)
1. [=Byte-lowercase=] |header name| and switch on the result:
<dl class="switch">
: `<code>content-disposition</code>`
:: 1. Set |name| and |filename| to null.
1. If |position| does not point to a sequence of bytes starting with
`<code>form-data; name="</code>`, return failure.
1. Advance |position| so it points at the byte after the next 0x22 (") byte (the one in
the sequence of bytes matched above).
1. Set |name| to the [=UTF-8 decode without BOM|UTF-8 decoding without BOM=] of the
result of collecting a sequence of bytes that are not 0x0A (LF), 0x0D (CR) or 0x22
("), given |position|.
<p class=XXX>The parsing of names and filenames is being worked on in <a
href="http://github.com/andreubotella/multipart-form-data/issues/1">issue #1</a>.
1. If the byte at |position| is not 0x22 ("), return failure. Otherwise, advance
|position| by 1.
1. If |position| points to a sequence of bytes starting with
`<code>; filename="</code>`:
1. Advance |position| so it points at the byte after the next 0x22 (") byte (the
one in the sequence of bytes matched above).
1. Set |filename| to the [=UTF-8 decode without BOM|UTF-8 decoding without BOM=] of
the result of collecting a sequence of bytes that are not 0x0A (LF), 0x0D (CR)
or 0x22 ("), given |position|.
<p class=XXX>The parsing of names and filenames is being worked on in <a
href="http://github.com/andreubotella/multipart-form-data/issues/1">issue
#1</a>.
1. If the byte at |position| is not 0x22 ("), return failure. Otherwise, advance
|position| by 1.
: `<code>content-type</code>`
:: 1. Let |header value| be the result of collecting a sequence of bytes that are not 0x0A
(LF) or 0x0D (CR), given |position|.
1. Remove any [=HTTP tab or space bytes=] from the end of |header value|.
1. Set |contentType| to the [=isomorphic decoding=] of |header value|.
: Otherwise
:: Collect a sequence of bytes that are not 0x0A (LF) or 0x0D (CR), given |position|. (Do
nothing with those bytes.)
1. If |position| does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return
failure. Otherwise, advance |position| by 2 (past the newline).
</div>