index.bs

<pre class='metadata'>
Group: WHATWG
H1: <code>multipart/form-data</code>
Title: multipart/form-data
Shortname: formdata
Status: DREAM
Text Macro: TWITTER (unused)
Text Macro: LOGO https://resources.whatwg.org/logo.svg
Metadata Include: Participate off, Commits off, Tests off
!Repository: <a href=https://github.com/andreubotella/multipart-form-data>GitHub</a>
Abstract: A web-spec definition of the <dfn export><code>multipart/form-data</code></dfn> format and
          related algorithms, meant for inclusion in the WHATWG standards.
</pre>

<pre class="anchors">
url:https://httpwg.org/specs/rfc7230.html#header.fields;text:field-name;type:dfn;spec:http
</pre>

<code>multipart/form-data</code> serializing {#serializing}
===========================================================

A <dfn export for="multipart/form-data" lt="boundary"><code>multipart/form-data</code>
boundary</dfn> is a [=byte sequence=] such that:

*   its length is greater or equal to 27 and lesser or equal to 70, and
*   it is composed by bytes in the ranges 0x30 to 0x39, 0x41 to 0x5A, or 0x61 to 0x7A, inclusive
    ([=ASCII alphanumeric=]), or which are 0x27 ('), 0x2D (-) or 0x5F (_).

To <dfn export for="multipart/form-data/boundary">generate</dfn> a <a
for="multipart/form-data" lt="boundary"><code>multipart/form-data</code> boundary</a>, return an
[=implementation-defined=] byte sequence which fullfills the conditions for boundaries, such that
part of it is randomly generated, with a minimum entropy of 95 bits.

<p class="note">Previous definitions of <a><code>multipart/form-data</code></a> <span
class="allow-2119">required</span> that the [=multipart/form-data/boundary=] associated with a
<code>multipart/form-data</code> payload not be present anywhere in the payload other than as a
delimiter, although they allow for generating the [=multipart/form-data/boundary=]
probabilistically. Since this generation algorithm is separate from a payload, however, it has to
specify a minimum entropy instead. [[RFC7578]] [[RFC2046]]

<p class="note">If a user agent generates <a><code>multipart/form-data</code></a> boundaries with a
length of 27 and an entropy of 95 bits, given a payload made specifically to generate collisions
with that user agent's boundaries, the expected length of the payload before a collision is found is
well over a yottabyte.

<hr>

<div algorithm="escape a multipart/form-data name">

To <dfn>escape a <code>multipart/form-data</code> name</dfn> with a string |name|, an optional
[=/encoding=] |encoding| (default [=UTF-8=]) and an optional boolean <dfn for="escape
name">|isFilename|</dfn> (default false):

1.  If |isFilename| is true:
    1.  Set |name| to the result of [=string/converting=] |name| into a [=scalar value string=].

1.  Otherwise:
    1.  [=Assert=]: |name| is a [=scalar value string=].
    1.  Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every occurrence of
        U+000A (LF) not preceded by U+000D (CR), in |name|, by a string consisting of U+000D (CR)
        and U+000A (LF).
1.  Let |encoded| be the result of [=/encode|encoding=] |name| with |encoding|.
1.  Replace every 0x0A (LF) bytes in |encoded| with the byte sequence `<code>%0A</code>`, 0x0D (CR)
    with `<code>%0D</code>` and 0x22 (") with `<code>%22</code>`.
1.  Return |encoded|.

</div>

<div algorithm="multipart/form-data chunk serializer">

The <dfn export><code>multipart/form-data</code> chunk serializer</dfn> takes an [=/entry list=]
|entries| and an optional [=/encoding=] |encoding| (default [=UTF-8=]), and returns a tuple of a
<a for="multipart/form-data" lt="boundary"><code>multipart/form-data</code> boundary</a> and a list
of chunks, each of which can be either a byte sequence or a {{File}}:

1.  Set |encoding| to the result of [=getting an output encoding=] from |encoding|.
1.  Let |boundary| be the result of [=multipart/form-data/boundary/generating=] a
    <a for="multipart/form-data" lt="boundary"><code>multipart/form-data</code> boundary</a>.
1.  Let |output chunks| be an empty list.
1.  [=list/For each=] |entry| in |entries|:
    1.  Let |chunk| be a byte sequence containing `<code>--</code>`, followed by |boundary|,
        followed by 0x0D 0x0A (CR LF).
    1.  Append `<code>Content-Disposition: form-data; name="</code>`, followed by the result of
        <a>escaping a <code>multipart/form-data</code> name</a> given |entry|'s
        [=entry list/entry/name=] and |encoding|, followed by 0x22 ("), to |chunk|.
    1.  Let |value| be |entry|'s [=entry list/entry/value=].
    1.  If |value| is a string:
        1.  Append 0x0D 0x0A 0x0D 0x0A (CR LF CR LF) to |chunk|.
        1.  Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every
            occurrence of U+000A (LF) not preceded by U+000D (CR), in |value|, by a string
            consisting of U+000D (CR) and U+000A (LF).
        1.  Append the result of [=/encode|encoding=] |value| with |encoding| to |chunk|.
        1.  Append 0x0D 0x0A (CR LF) to |chunk|.
        1.  Append |chunk| to |output chunks|.

    1.  Otherwise:
        1.  [=Assert=]: |value| is a {{File}}.
        1.  Append `<code>; filename="</code>`, followed by the result of <a>escaping a
            <code>multipart/form-data</code> name</a> given |value|'s {{File/name}} with |encoding|
            and <a for="escape name"><var ignore>isFilename</var></a> set to true, followed by
            0x22 0x0D 0x0A (" CR LF), to |chunk|.
        1.  Let |type| be |value|'s {{Blob/type}}, if it is not the empty string, or
            "<code>application/octet-stream</code>" otherwise.
        1.  Append `<code>Content-Type: </code>`, followed by the result of [=isomorphic encoding=]
            |type|, to |chunk|.
        1.  Append 0x0D 0x0A 0x0D 0x0A (CR LF CR LF) to |chunk|.
        1.  Append |chunk|, followed by |value|, followed by the byte sequence 0x0D 0x0A (CR LF), to
            |output chunks|.
1.  Append the byte sequence containing `<code>--</code>`, followed by |boundary|, followed by
    `<code>--</code>`, followed by 0x0D 0x0A (CR LF), to |output chunks|.
1.  Return the tuple |boundary| / |output chunks|.

</div>

<p class="note">This algorithm now matches the behavior of all major browsers.</p>

<hr>

<div algorithm="length of a multipart/form-data payload">

The <dfn export for="multipart/form-data">length</dfn> of a <a><code>multipart/form-data</code></a>
payload, given a list of chunks |chunks| which can be either byte sequences or {{File}}s, is the
result of running the following steps:

1.  Let |length| be 0.
1.  [=list/For each=] |chunk| in |chunks|:
    1.  If |chunk| is a byte sequence:
        1.  Increase |length| by |chunk|'s length.

    1.  Otherwise:
        1.  [=Assert=]: |chunk| is a {{File}}.
        1.  Increase |length| by |chunk|'s {{Blob/size}}.
1.  Return |length|.

</div>

<div algorithm="create a multipart/form-data readable stream">

To <dfn export>create a <code>multipart/form-data</code> readable stream</dfn> from a list of chunks
|chunks| which can be either byte sequences or {{File}}s, run the following steps:

1.  Let |file stream| be null.
1.  Let |stream| be a [=new=] {{ReadableStream}}.
1.  Let |pull algorithm| be an algorithm that runs the following steps:
    <dl class="switch">
      : if |file stream| is null and |chunks| is not empty
      ::  1.  If |chunks|[0] is a byte sequence, [=ReadableStream/enqueue=] a {{Uint8Array}} object
              wrapping an {{ArrayBuffer}} containing |chunks|[0] into |stream|.
          1.  Otherwise:
              1.  [=Assert=]: |chunks|[0] is a {{File}} object.
              1.  Set |file stream| to the result of running |chunks|[0]'s {{Blob/stream}} method.
              1.  Run |pull algorithm|.
          1.  [=list/Remove=] the first item from |chunks|.
      : if |file stream| is null and |chunks| is empty
      ::  1.  [=ReadableStream/Close=] |stream|.
      : if |file stream| is not null
      ::  1.  Let |read request| be a new [=read request=] with the following [=struct/items=]:
              : [=read request/chunk steps=], given |chunk|
              ::  1.  If |chunk| is not a {{Uint8Array}} object, [=ReadableStream/error=] |stream|
                      with a {{TypeError}} and abort these steps.
                  1.  [=ReadableStream/Enqueue=] |chunk| into |stream|.
              : [=read request/close steps=]
              ::  1.  Set |file stream| to null.
                  1.  Run |pull algorithm|.
              : [=read request/error steps=], given |e|
              ::  1.  [=ReadableStream/Error=] |stream| with |e|.
          1.  Let |reader| be the result of [=ReadableStream/getting a reader=] for |file stream|.
          1.  [=ReadableStreamDefaultReader/Read a chunk=] from |reader| with |read request|.

    </dl>
1.  Let |cancel algorithm| be an algorithm that runs the following steps, given |reason|:
    1.  If |file stream| is not null, [=ReadableStream/cancel=] |file stream| with |reason|.
1.  [=ReadableStream/Set up=] |stream| with <a
    for="ReadableStream/set up"><var ignore>pullAlgorithm</var></a> set to |pull algorithm| and <a
    for="ReadableStream/set up"><var ignore>cancelAlgorithm</a> set to |cancel algorithm|.
1.  Return |stream|.

</div>

<code>multipart/form-data</code> parsing {#parsing}
===================================================

<div class="XXX">
  <p>These algorithms are a first attempt at defining a <a><code>multipart/form-data</code></a>
  parser for use in {{Body}}'s {{Body/formData()}} method. The current algorithms don't yet match
  any browser because their behavior disagrees at various points.

  <p>Note that Gecko and Chromium also implement a Web Extensions API that parses
  <code>multipart/form-data</code> independently from the parser in {{Body}} (see <a
  href="https://bugzilla.mozilla.org/show_bug.cgi?id=1697292">Gecko bug 1697292</a>):

  <xmp class="lang-js" style="color: #666666">
    chrome.webRequest.onBeforeRequest.addListener(
      (details) => {
        // Returns an object mapping names to an array of values represented by
        // either the string value or by the file's filename.
        console.log(details.requestBody.formData);
      },
      {urls: ["<all_urls>"]},
      ["requestBody"]
    );
  </xmp>
</div>

<div algorithm="multipart/form-data parser">

The <dfn export><code>multipart/form-data</code> parser</dfn> takes a byte sequence |input| and a
[=MIME type=] |mimeType|, and returns either an [=/entry list=] or failure:

1.  [=Assert=]: |mimeType|'s [=essence=] is "<code>multipart/form-data</code>".
1.  If |mimeType|'s [=MIME type/parameters=]["<code>boundary</code>"] does not [=map/exist=], return
    failure. Otherwise, let |boundary| be the result of [=UTF-8 decoding=] |mimeType|'s [=MIME
    type/parameters=]["<code>boundary</code>"].

    <p class="XXX">The definition of [=MIME type=] in [[MIMESNIFF]] has the [=MIME
    type/parameter=] values being [=ASCII strings=], but the [=parse a MIME type=] algorithm can
    create [=MIME type records=] containing non-ASCII parameter values. See <a
    href="https://github.com/whatwg/mimesniff/issues/141">whatwg/mimesniff issue #141</a>. Gecko and
    WebKit accept non-ASCII boundary strings and then expect them [=UTF-8 encoded=] in the request
    body; Chromium rejects them instead.
1.  Let |entry list| be an empty [=/entry list=].
1.  Let |position| be a pointer to a byte in |input|, initially pointing at the first byte.
1.  While true:
    1.  If |position| points to a sequence of bytes starting with 0x2D 0x2D (`<code>--</code>`)
        followed by |boundary|, advance |position| by 2 + the length of |boundary|. Otherwise,
        return failure.
    1.  If |position| points to the sequence of bytes 0x2D 0x2D 0x0D 0x0A (`<code>--</code>`
        followed by CR LF) followed by the end of |input|, return |entry list|.
    1.  If |position| does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return
        failure.
    1.  Advance |position| by 2. (This skips past the newline.)
    1.  Let |name|, |filename| and |contentType| be the result of <a>parsing
        <code>multipart/form-data</code> headers</a> on |input| and |position|, if the result is not
        failure. Otherwise, return failure.
    1.  Advance |position| by 2. (This skips past the empty line that marks the end of the headers.)
    1.  Let |body| be the empty byte sequence.
    1.  <i>Body loop</i>: While |position| is not past the end of |input|:
        1.  Append the code point at |position| to |body|.
        1.  If |body| ends with |boundary|:
            1.  Remove the last 4 + (length of |boundary|) bytes from |body|.
            1.  Decrease |position| by 4 + (length of |boundary|).
            1.  Break out of <i>body loop</i>.
    1.  If |position| does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return
        failure. Otherwise, advance |position| by 2.
    1.  If |filename| is not null:
        1.  If |contentType| is null, set |contentType| to "<code>text/plain</code>".
        1.  If |contentType| is not an [=ASCII string=], set |contentType| to the empty string.
        1.  Let |value| be a new {{File}} object with name |filename|, type |contentType|, and body
            |body|.

    1.  Otherwise:
        1.  Let |value| be the [=UTF-8 decode without BOM|UTF-8 decoding without BOM=] of |body|.
    1.  Assert: |name| is a [=scalar value string=] and |value| is either a [=scalar value string=]
        or a {{File}} object.
    1.  [=entry list/Create an entry=] with |name| and |value|, and [=list/append=] it to
        |entry list|.

</div>

<div algorithm="parse multipart/form-data headers">

To <dfn>parse <code>multipart/form-data</code> headers</dfn>, given a byte sequence <var
ignore>input</var> and a pointer into it |position|, run the following steps:

1.  Let |name|, |filename| and |contentType| be null.
1.  While true:
    1.  If |position| points to a sequence of bytes starting with 0x0D 0x0A (CR LF):
        1.  If |name| is null, return failure.
        1.  Return |name|, |filename| and |contentType|.
    1.  Let |header name| be the result of collecting a sequence of bytes that are not 0x0A (LF),
        0x0D (CR) or 0x3A (:), given |position|.
    1.  Remove any [=HTTP tab or space bytes=] from the start or end of |header name|.
    1.  If |header name| does not match the <a spec=http>field-name</a> token production, return
        failure.
    1.  If the byte at |position| is not 0x3A (:), return failure.
    1.  Advance |position| by 1.
    1.  Collect a sequence of bytes that are [=HTTP tab or space bytes=] given |position|. (Do
        nothing with those bytes.)
    1.  [=Byte-lowercase=] |header name| and switch on the result:
        <dl class="switch">
        : `<code>content-disposition</code>`
        ::  1.  Set |name| and |filename| to null.
            1.  If |position| does not point to a sequence of bytes starting with
                `<code>form-data; name="</code>`, return failure.
            1.  Advance |position| so it points at the byte after the next 0x22 (") byte (the one in
                the sequence of bytes matched above).
            1.  Set |name| to the [=UTF-8 decode without BOM|UTF-8 decoding without BOM=] of the
                result of collecting a sequence of bytes that are not 0x0A (LF), 0x0D (CR) or 0x22
                ("), given |position|.

                <p class=XXX>The parsing of names and filenames is being worked on in <a
                href="http://github.com/andreubotella/multipart-form-data/issues/1">issue #1</a>.
            1.  If the byte at |position| is not 0x22 ("), return failure. Otherwise, advance
                |position| by 1.
            1.  If |position| points to a sequence of bytes starting with
                `<code>; filename="</code>`:
                1.  Advance |position| so it points at the byte after the next 0x22 (") byte (the
                    one in the sequence of bytes matched above).
                1.  Set |filename| to the [=UTF-8 decode without BOM|UTF-8 decoding without BOM=] of
                    the result of collecting a sequence of bytes that are not 0x0A (LF), 0x0D (CR)
                    or 0x22 ("), given |position|.

                    <p class=XXX>The parsing of names and filenames is being worked on in <a
                    href="http://github.com/andreubotella/multipart-form-data/issues/1">issue
                    #1</a>.
                1.  If the byte at |position| is not 0x22 ("), return failure. Otherwise, advance
                    |position| by 1.
        : `<code>content-type</code>`
        ::  1.  Let |header value| be the result of collecting a sequence of bytes that are not 0x0A
                (LF) or 0x0D (CR), given |position|.
            1.  Remove any [=HTTP tab or space bytes=] from the end of |header value|.
            1.  Set |contentType| to the [=isomorphic decoding=] of |header value|.
        : Otherwise
        ::  Collect a sequence of bytes that are not 0x0A (LF) or 0x0D (CR), given |position|. (Do
            nothing with those bytes.)
    1.  If |position| does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return
        failure. Otherwise, advance |position| by 2 (past the newline).

</div>