From 216b922bf37fbc2a3d9643439d5253e5a8c3285b Mon Sep 17 00:00:00 2001 From: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com> Date: Mon, 22 Jul 2024 01:40:04 +0200 Subject: [PATCH] More struct and metrics descriptions (#12) * summary struct descriptions * custom metrics struct * added schema table * privacy metrics * ads metrics * Update src/content/docs/reference/tables/requests.mdx Co-authored-by: Rick Viscomi * Update src/content/docs/reference/tables/requests.mdx Co-authored-by: Rick Viscomi * schemas update --------- Co-authored-by: Rick Viscomi --- .../{page-metdata.mdx => page-metadata.mdx} | 4 +- .../docs/reference/custom-metrics/ads.md | 162 ++++++ .../docs/reference/custom-metrics/privacy.md | 125 +++++ .../docs/reference/structs/custom-metrics.mdx | 272 ++++++++- src/content/docs/reference/structs/header.mdx | 7 +- .../docs/reference/structs/page-summary.md | 519 ++++++++++++++++++ .../docs/reference/structs/request-summary.md | 243 ++++++++ src/content/docs/reference/tables/pages.mdx | 4 +- .../docs/reference/tables/requests.mdx | 7 + 9 files changed, 1332 insertions(+), 11 deletions(-) rename src/content/docs/reference/blobs/{page-metdata.mdx => page-metadata.mdx} (97%) create mode 100644 src/content/docs/reference/custom-metrics/ads.md create mode 100644 src/content/docs/reference/custom-metrics/privacy.md create mode 100644 src/content/docs/reference/structs/page-summary.md create mode 100644 src/content/docs/reference/structs/request-summary.md diff --git a/src/content/docs/reference/blobs/page-metdata.mdx b/src/content/docs/reference/blobs/page-metadata.mdx similarity index 97% rename from src/content/docs/reference/blobs/page-metdata.mdx rename to src/content/docs/reference/blobs/page-metadata.mdx index 5b3c3af..d7c9f1b 100644 --- a/src/content/docs/reference/blobs/page-metdata.mdx +++ b/src/content/docs/reference/blobs/page-metadata.mdx @@ -67,7 +67,7 @@ From the list of candidates, the link with the largest hit area is selected to b At a given crawl depth, this value represents the index in the list of pages being tested. Currently, HTTP Archive only crawls one page per level, so this value is always `0`. -Hypothetically, HTTP Archive can crawl multiple pages per level. For example, at crawl depth `0`, the page is the root. Given that there's only one root page, the `link_depth` would be `0`. At crawl depth `1`, there may be many secondary page candidates. Instead of testing only one of them, HTTP Archive could test multiple secondary pages that are all linked from the root page. These pages would have a `link_depth` of `0`, `1`, `2`, etc, where the smaller indexes represent the pages that are more prominently linked from the preceding page. +Hypothetically, HTTP Archive can crawl multiple pages per level. For example, at crawl depth `0`, the page is the root. Given that there's only one root page, the `link_depth` would be `0`. At crawl depth `1`, there may be many secondary page candidates. Instead of testing only one of them, HTTP Archive could test multiple secondary pages that are all linked from the root page. These pages would have a `link_depth` of `0`, `1`, `2`, etc, where the smaller indexes represent the pages that are more prominently linked from the preceding page. ### `root_page_id` @@ -79,4 +79,4 @@ URL of the root page. ### `root_page_test_id` -WebPageTest ID of the root page. \ No newline at end of file +WebPageTest ID of the root page. diff --git a/src/content/docs/reference/custom-metrics/ads.md b/src/content/docs/reference/custom-metrics/ads.md new file mode 100644 index 0000000..e121df0 --- /dev/null +++ b/src/content/docs/reference/custom-metrics/ads.md @@ -0,0 +1,162 @@ +--- +title: Ads custom metric +description: Reference docs for the feature struct +--- + +_Appears in: [`custom_metrics` struct](/reference/structs/custom-metrics/)_\ +_As: [`ads`](/reference/structs/custom-metrics/#ads)_ + +## Schema + +| Field name | Type | Description | +| ------------------------------------------------ | ------------- | -------------------------------------------------------------------------------------------- | +| `ads` | object | Contains information about the ads.txt file. | +| `ads.present` | boolean | Indicates if the ads.txt file is present. | +| `ads.status` | integer | HTTP status code of the ads.txt file response. | +| `ads.redirected` | boolean | Indicates if the ads.txt file request was redirected. | +| `ads.redirected_to` | string | URL to which the ads.txt resource was redirected. | +| `ads.account_count` | integer | Number of advertising accounts listed in the ads.txt file. | +| `ads.account_types` | object | Types of accounts (direct or reseller) listed in the ads.txt file. | +| `ads.account_types.direct` | object | Information about direct advertising accounts. | +| `ads.account_types.direct.domains` | array | List of domains with direct advertising accounts. | +| `ads.account_types.direct.account_count` | integer | Number of direct advertising accounts. | +| `ads.account_types.direct.domain_count` | integer | Number of unique domains with direct advertising accounts. | +| `ads.account_types.reseller` | object | Information about reseller advertising accounts. | +| `ads.account_types.reseller.domains` | array | List of domains with reseller advertising accounts. | +| `ads.account_types.reseller.account_count` | integer | Number of reseller advertising accounts. | +| `ads.account_types.reseller.domain_count` | integer | Number of unique domains with reseller advertising accounts. | +| `ads.line_count` | integer | Total number of lines in the ads.txt file. | +| `ads.variables` | array | List of variables found in the ads.txt file. | +| `ads.variable_count` | integer | Number of variables found in the ads.txt file. | +| `app_ads` | object | Contains information about the app-ads.txt file. | +| `app_ads.present` | boolean | Indicates if the app-ads.txt file is present. | +| `app_ads.status` | integer | HTTP status code of the app-ads.txt file response. | +| `app_ads.redirected` | boolean | Indicates if the app-ads.txt file request was redirected. | +| `app_ads.redirected_to` | string | URL to which the app-ads.txt resource was redirected. | +| `app_ads.account_count` | integer | Number of advertising accounts listed in the app-ads.txt file. | +| `app_ads.account_types` | object | Types of accounts (direct or reseller) listed in the app-ads.txt file. | +| `app_ads.account_types.direct` | object | Information about direct advertising accounts. | +| `app_ads.account_types.direct.domains` | array | List of domains with direct advertising accounts. | +| `app_ads.account_types.direct.account_count` | integer | Number of direct advertising accounts. | +| `app_ads.account_types.direct.domain_count` | integer | Number of unique domains with direct advertising accounts. | +| `app_ads.account_types.reseller` | object | Information about reseller advertising accounts. | +| `app_ads.account_types.reseller.domains` | array | List of domains with reseller advertising accounts. | +| `app_ads.account_types.reseller.account_count` | integer | Number of reseller advertising accounts. | +| `app_ads.account_types.reseller.domain_count` | integer | Number of unique domains with reseller advertising accounts. | +| `app_ads.line_count` | integer | Total number of lines in the app-ads.txt file. | +| `app_ads.variables` | array | List of variables found in the app-ads.txt file. | +| `app_ads.variable_count` | integer | Number of variables found in the app-ads.txt file. | +| `sellers` | object | Contains information about the sellers.json file. | +| `sellers.present` | boolean | Indicates if the sellers.json file is present. | +| `sellers.status` | integer | HTTP status code of the sellers.json file response. | +| `sellers.redirected` | boolean | Indicates if the sellers.json file request was redirected. | +| `sellers.redirected_to` | string | URL to which the sellers.json resource was redirected. | +| `sellers.seller_count` | integer | Number of sellers listed in the sellers.json file. | +| `sellers.seller_types` | object | Types of sellers (publisher, intermediary, both) listed in the sellers.json file. | +| `sellers.seller_types.publisher` | object | Information about publisher sellers. | +| `sellers.seller_types.publisher.domains` | array | List of domains associated with publisher sellers. | +| `sellers.seller_types.publisher.seller_count` | integer | Number of publisher sellers. | +| `sellers.seller_types.publisher.domain_count` | integer | Number of unique domains associated with publisher sellers. | +| `sellers.seller_types.intermediary` | object | Information about intermediary sellers. | +| `sellers.seller_types.intermediary.domains` | array | List of domains associated with intermediary sellers. | +| `sellers.seller_types.intermediary.seller_count` | integer | Number of intermediary sellers. | +| `sellers.seller_types.intermediary.domain_count` | integer | Number of unique domains associated with intermediary sellers. | +| `sellers.seller_types.both` | object | Information about sellers who are both publishers and intermediaries. | +| `sellers.seller_types.both.domains` | array | List of domains associated with sellers who are both publishers and intermediaries. | +| `sellers.seller_types.both.seller_count` | integer | Number of sellers who are both publishers and intermediaries. | +| `sellers.seller_types.both.domain_count` | integer | Number of unique domains associated with sellers who are both publishers and intermediaries. | +| `sellers.passthrough_count` | integer | Number of passthrough sellers listed in the sellers.json file. | +| `sellers.confidential_count` | integer | Number of confidential sellers listed in the sellers.json file. | + +Here's an example of the decoded object from `https://www.amazon.com/` page crawl: + +```json +{ + "ads": { + "present": true, + "status": 200, + "redirected": false, + "account_count": 1, + "account_types": { + "direct": { + "domains": [ + "placeholder.example.com" + ], + "account_count": 1, + "domain_count": 1 + }, + "reseller": { + "domains": [], + "account_count": 0, + "domain_count": 0 + } + }, + "line_count": 10, + "variables": [], + "variable_count": 0 + }, + "app_ads": { + "present": true, + "status": 200, + "redirected": false, + "account_count": 1, + "account_types": { + "direct": { + "domains": [ + "placeholder.example.com" + ], + "account_count": 1, + "domain_count": 1 + }, + "reseller": { + "domains": [], + "account_count": 0, + "domain_count": 0 + } + }, + "line_count": 10, + "variables": [], + "variable_count": 0 + }, + "sellers": { + "present": true, + "redirected": true, + "status": 200, + "seller_count": 2732, + "seller_types": { + "publisher": { + "domains": [ + "cumuli.com", + "realself.com", + "trendscatchers.io", + ... + ], + "seller_count": 2199, + "domain_count": 1923 + }, + "intermediary": { + "domains": [ + "bidsxchange.com", + "vuukle.com", + "vdo.ai", + ... + ], + "seller_count": 232, + "domain_count": 172 + }, + "both": { + "domains": [ + "gourmetads.com", + "freestar.com", + "shinez.io", + ... + ], + "seller_count": 148, + "domain_count": 134 + } + }, + "passthrough_count": 0, + "confidential_count": 2 + } +} +``` diff --git a/src/content/docs/reference/custom-metrics/privacy.md b/src/content/docs/reference/custom-metrics/privacy.md new file mode 100644 index 0000000..ec84277 --- /dev/null +++ b/src/content/docs/reference/custom-metrics/privacy.md @@ -0,0 +1,125 @@ +--- +title: Privacy custom metric +description: Reference docs for the feature struct +--- + +_Appears in: [`custom_metrics` struct](/reference/structs/custom-metrics/)_\ +_As: [`privacy`](/reference/structs/custom-metrics/#privacy)_ + +## Schema + +| Field name | Type | Description | +| -------------------------------------------- | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `privacy_wording_links` | array | Links related to privacy policy. | +| `privacy_wording_links[i].text` | string | Title of the link. | +| `iab_tcf_v1` | object | IAB TCF v1 settings. | +| `iab_tcf_v1.present` | boolean | Presence of IAB TCF v1. | +| `iab_tcf_v1.data` | object | TCF v1 vendor consents. [VendorConsents](https://github.com/InteractiveAdvertisingBureau/GDPR-Transparency-and-Consent-Framework/blob/master/CMP%20JS%20API%20v1.1%20Final.md#vendorconsents-) | +| `iab_tcf_v1.compliant_setup` | boolean | Verifies compliance of TCF v1 vendor consents. | +| `iab_tcf_v2` | object | IAB TCF v2 settings. | +| `iab_tcf_v2.present` | boolean | Presence of IAB TCF v2. | +| `iab_tcf_v2.data` | object | TCF v2 vendor consents. [TCData](https://github.com/InteractiveAdvertisingBureau/GDPR-Transparency-and-Consent-Framework/blob/master/TCFv2/IAB%20Tech%20Lab%20-%20CMP%20API%20v2.md#tcdata) | +| `iab_tcf_v2.compliant_setup` | boolean | Verifies compliance of TCF v2 vendor consents. | +| `iab_usp` | object | Shows the presence of IAB U.S. Privacy String. | +| `iab_usp.present` | boolean | Shows the presence of IAB U.S. Privacy String. | +| `iab_usp.privacy_string` | string | IAB U.S. Privacy String. | +| `navigator_doNotTrack` | boolean | Indicates whether the browser's "Do Not Track" setting is enabled. | +| `navigator_globalPrivacyControl` | boolean | Indicates whether the browser's Global Privacy Control setting is enabled. | +| `document_permissionsPolicy` | boolean | Indicates the presence of the Permissions Policy. | +| `document_featurePolicy` | boolean | Indicates the presence of the Feature Policy. | +| `referrerPolicy` | object | Specifies the referrer policy for the entire document and individual requests. | +| `referrerPolicy.entire_document_policy` | string | Referrer policy for the entire document. | +| `referrerPolicy.individual_requests` | string | Referrer policy for individual requests. | +| `referrerPolicy.link_relations` | string | Referrer policy for link relations. | +| `media_devices` | object | Tracks the usage of media device APIs like `enumerateDevices` and `getUserMedia`. | +| `media_devices["API_NAME"]` | boolean | Indicates usage of a particular API. | +| `geolocation` | object | Tracks the usage of geolocation APIs like `getCurrentPosition` and `watchPosition`. | +| `geolocation["API_NAME"]` | boolean | Indicates usage of a particular API. | +| `fingerprinting` | object | Tracks potential fingerprinting attempts by counting API calls and listing likely fingerprinting scripts. | +| `fingerprinting.counts` | object | Counts of fingerprinting-related API calls. | +| `fingerprinting.counts["API_NAME"]` | integer | Counts of fingerprinting-related API calls. | +| `fingerprinting.likelyFingerprintingScripts` | array | List of likely fingerprinting script URLs. | +| `request_hostnames_with_cname` | object | Lists hostnames with their corresponding CNAME records. | +| `request_hostnames_with_cname.["HOSTNAME"]` | array | CNAME records for a given hostname. | +| `ccpa_link` | object | California Consumer Privacy Act (CCPA) compliance. | +| `ccpa_link.hasCCPALink` | boolean | Presence of a CCPA link. | +| `ccpa_link.CCPALinkPhrases` | array | Related CCPA link phrases. | + +Here's an example of the decoded object from `https://www.google.com/` page crawl: + +```json +{ + "privacy_wording_links": [ + { + "text": "Privacy" + } + ], + "iab_tcf_v1": { + "present": false, + "data": null, + "compliant_setup": null + }, + "iab_tcf_v2": { + "present": false, + "data": null, + "compliant_setup": null + }, + "iab_usp": { + "present": false, + "privacy_string": null + }, + "navigator_doNotTrack": false, + "navigator_globalPrivacyControl": false, + "document_permissionsPolicy": false, + "document_featurePolicy": false, + "referrerPolicy": { + "entire_document_policy": "origin", + "individual_requests": null, + "link_relations": null + }, + "media_devices": { + "navigator_mediaDevices_enumerateDevices": false, + "navigator_mediaDevices_getUserMedia": true, + "navigator_mediaDevices_getDisplayMedia": false + }, + "geolocation": { + "navigator_geolocation_getCurrentPosition": false, + "navigator_geolocation_watchPosition": false + }, + "fingerprinting": { + "counts": { + "prefers-contrast": 4, + "forced-colors": 15, + "devicememory": 1, + "hardwareconcurrency": 2, + "localstorage": 5, + "screen.width": 7, + "screen.height": 5, + "sessionstorage": 1, + "gettimezoneoffset": 5, + "maxtouchpoints": 5, + "ontouchstart": 5, + "navigator.vendor": 1, + "getchanneldata": 4, + "navigator.platform": 1 + }, + "likelyFingerprintingScripts": [ + "https://www.google.com/", + "https://www.gstatic.com/og/_/js/k=og.qtm.en_US.ftxzKLuybBw.2019.O/rt=j/m=qabr,q_d,qcwid,qapid,qald,q_dg/exm=qaaw,qadd,qaid,qein,qhaw,qhba,qhbr,qhch,qhga,qhid,qhin/d=1/ed=1/rs=AA2YrTsOEv0aSAP39vut5xzjLXfdU4aRbQ", + ... + ] + }, + "request_hostnames_with_cname": { + "ogs.google.com": [ + "www3.l.google.com" + ], + "apis.google.com": [ + "plus.l.google.com" + ] + }, + "ccpa_link": { + "hasCCPALink": false, + "CCPALinkPhrases": [] + } +} +``` diff --git a/src/content/docs/reference/structs/custom-metrics.mdx b/src/content/docs/reference/structs/custom-metrics.mdx index ec015a1..1849402 100644 --- a/src/content/docs/reference/structs/custom-metrics.mdx +++ b/src/content/docs/reference/structs/custom-metrics.mdx @@ -3,14 +3,276 @@ title: Custom metrics struct description: Reference docs for the custom metrics struct --- -_Appears in: [`pages` table](/reference/tables/pages/)_ - -TODO +_Appears in: [`pages` table](/reference/tables/pages/)_\ +_As [`custom_metrics`](/reference/tables/pages/#custom_metrics)_ ## Schema -TODO +| Field name | Type | Description | +|--------------------|---------|---------------------------------------------------| +| `a11y` | String | Accessibility. | +| `ads` | String | Advertising technology and usage. | +| `almanac` | String | Metrics defined in the early versions of Web Almanac crawls. | +| `aurora` | String | Project Aurora. | +| `avg_dom_depth` | Integer | The average DOM depth of a page. | +| `cms` | String | Content Management Systems. | +| `Colordepth` | String | Color depth of a screen. | +| `cookies` | String | Cookie usage. | +| `crawl_links` | String | The links found during a crawl. | +| `css` | String | CSS usage. | +| `css-variables` | String | Use of CSS variables. | +| `doctype` | String | Document type declaration. | +| `document_height` | Integer | Height of the document. | +| `document_width` | Integer | Width of the document. | +| `Dpi` | Integer | Dots per inch (DPI) of a screen. | +| `event-names` | String | Event names used in JavaScript. | +| `fugu-apis` | String | Usage of Fugu APIs. | +| `generated-content`| String | Client-side generated content. | +| `has_shadow_root` | String | Presence of shadow DOM roots. | +| `Images` | String | Images usage. | +| `img-loading-attr` | String | Image loading attributes. | +| `initiators` | String | Resource initiators. | +| `inline_style_bytes`| Integer | Size of inline styles. | +| `javascript` | String | JavaScript usage. | +| `lib-detector-version`| String | Libraries detector version. | +| `localstorage_size`| Integer | Size of local storage. | +| `markup` | String | HTML markup. | +| `media` | String | Media elements. | +| `meta_viewport` | String | Meta viewport tag. | +| `num_iframes` | Integer | Number of iframes on a page. | +| `num_scripts` | Integer | Number of script tags. | +| `num_scripts_async`| Integer | Number of asynchronous scripts. | +| `num_scripts_sync` | Integer | Number of synchronous scripts. | +| `observers` | String | Metrics related to the usage of observer APIs. | +| `performance` | String | Web performance. | +| `privacy-sandbox` | String | Privacy Sandbox initiative usage. | +| `privacy` | String | Privacy settings and policies. | +| `pwa` | String | Progressive Web Apps. | +| `quirks_mode` | String | Usage of quirks mode in browsers. | +| `Resolution` | Integer | Resolution of a screen. | +| `responsive_images`| String | Responsive image techniques. | +| `robots_meta` | String | Robots meta tag. | +| `robots_txt` | String | robots.txt file. | +| `sass` | String | Usage of Sass. | +| `security` | String | Security features. | +| `sessionstorage_size`| Integer | Size of session storage. | +| `structured-data` | String | Structured data. | +| `third-parties` | String | Third-party resources. | +| `usertiming` | String | User Timing API. | +| `valid-head` | String | Validity of the head element. | +| `well-known` | String | well-known URIs. | +| `wpt_bodies` | String | Metrics derived from WebPageTest bodies object. | + +### `a11y` + +Accessibility. + +### `ads` + +Advertising technology and usage. + +See the [`ads` custom metric](/reference/custom-metrics/ads/) for more information. + +### `almanac` + +Metrics defined in the early versions of Web Almanac crawls. + +### `aurora` + +Project Aurora. + +### `avg_dom_depth` + +The average DOM depth of a page. + +### `cms` + +Content Management Systems. + +### `Colordepth` + +Color depth of a screen. + +### `cookies` + +Cookie usage. + +### `crawl_links` + +The links found during a crawl. + +### `css` + +CSS usage. + +### `css-variables` + +Use of CSS variables. + +### `doctype` + +Document type declaration. + +### `document_height` + +Height of the document. + +### `document_width` + +Width of the document. + +### `Dpi` + +Dots per inch (DPI) of a screen. + +### `event-names` + +Event names used in JavaScript. + +### `fugu-apis` + +Usage of Fugu APIs. + +### `generated-content` + +Client-side generated content. + +### `has_shadow_root` + +Presence of shadow DOM roots. + +### `Images` + +Images usage. + +### `img-loading-attr` + +Image loading attributes. + +### `initiators` + +Resource initiators. + +### `inline_style_bytes` + +Size of inline styles. + +### `javascript` + +JavaScript usage. + +### `lib-detector-version` + +Libraries detector version. + +### `localstorage_size` -### performance +Size of local storage. + +### `markup` + +HTML markup. + +### `media` + +Media elements. + +### `meta_viewport` + +Meta viewport tag. + +### `num_iframes` + +Number of iframes on a page. + +### `num_scripts` + +Number of script tags. + +### `num_scripts_async` + +Number of asynchronous scripts. + +### `num_scripts_sync` + +Number of synchronous scripts. + +### `observers` + +Metrics related to the usage of observer APIs. + +### `performance` + +Web performance. See the [`performance` custom metric](/reference/custom-metrics/performance/) for more information. + +### `privacy-sandbox` + +Privacy Sandbox initiative usage. + +### `privacy` + +Privacy settings and policies. + +See the [`privacy` custom metric](/reference/custom-metrics/privacy/) for more information. + +### `pwa` + +Progressive Web Apps. + +### `quirks_mode` + +Usage of quirks mode in browsers. + +### `Resolution` + +Resolution of a screen. + +### `responsive_images` + +Responsive image techniques. + +### `robots_meta` + +Robots meta tag. + +### `robots_txt` + +robots.txt file. + +### `sass` + +Usage of Sass. + +### `security` + +Security features. + +### `sessionstorage_size` + +Size of session storage. + +### `structured-data` + +Structured data. + +### `third-parties` + +Third-party resources. + +### `usertiming` + +User Timing API. + +### `valid-head` + +Validity of the head element. + +### `well-known` + +well-known URIs. + +### `wpt_bodies` + +Metrics derived from WebPageTest bodies object. diff --git a/src/content/docs/reference/structs/header.mdx b/src/content/docs/reference/structs/header.mdx index 9ee58e2..0e93c8a 100644 --- a/src/content/docs/reference/structs/header.mdx +++ b/src/content/docs/reference/structs/header.mdx @@ -3,9 +3,10 @@ title: Header struct description: Reference docs for the header struct --- -_Appears in: [`requests` table](/reference/tables/requests/)_ +_Appears in: [`requests` table](/reference/tables/requests/)_\ +_As: `request_headers`, `response_headers`_ -Each headerĀ is a key-value pair corresponding to an HTTP header sent from or to the client: request and response headers, respectively. +Each header is a key-value pair corresponding to an HTTP header sent from or to the client: request and response headers, respectively. ## Schema @@ -20,4 +21,4 @@ Header name ### `value` -Header value \ No newline at end of file +Header value diff --git a/src/content/docs/reference/structs/page-summary.md b/src/content/docs/reference/structs/page-summary.md new file mode 100644 index 0000000..117cb05 --- /dev/null +++ b/src/content/docs/reference/structs/page-summary.md @@ -0,0 +1,519 @@ +--- +title: Page summary struct +description: Reference docs for the page summary struct +--- + +_Appears in: [`pages` table](/reference/tables/pages/)_\ +_As: `summary`_ + +JSON-encoded summarization of the page-level data. + +Here's an example of the decoded object: + +```json +{ + "metadata": "{\"rank\": 500000, \"page_id\": 24036445, \"tested_url\": \"https://www.example.com/\", \"layout\": \"Desktop\", \"crawl_depth\": 0, \"link_depth\": 0, \"root_page_id\": 24036445, \"root_page_url\": \"https://www.example.com/\", \"root_page_test_id\": \"240709_Dx1UR_EB6N1\"}", + "pageid": 24036445, + "createDate": 1720729906, + "startedDateTime": 1720729879, + "archive": "All", + "label": "Jul 1 2024", + "crawlid": 0, + "url": "https://www.example.com/", + "urlhash": 56511, + "urlShort": "https://www.example.com/", + "TTFB": 236, + "renderStart": 400, + "fullyLoaded": 338, + "visualComplete": 400, + "onLoad": 325, + "gzipTotal": 648, + "gzipSavings": 0, + "numDomElements": 12, + "onContentLoaded": 254, + "cdn": "Edgecast", + "SpeedIndex": 400, + "PageSpeed": null, + "_connections": 1, + "_adult_site": false, + "avg_dom_depth": 3, + "doctype": "html", + "document_height": 993, + "document_width": 1920, + "localstorage_size": 0, + "sessionstorage_size": 0, + "meta_viewport": "width=device-width, initial-scale=1", + "num_iframes": 0, + "num_scripts": 0, + "num_scripts_sync": 0, + "num_scripts_async": 0, + "usertiming": 0, + "reqTotal": 2, + "bytesTotal": 1296, + "reqJS": 0, + "bytesJS": 0, + "reqImg": 1, + "bytesImg": 648, + "reqJson": 0, + "bytesJson": 0, + "reqCss": 0, + "bytesCss": 0, + "reqHtml": 1, + "bytesHtml": 648, + "reqFont": 0, + "bytesFont": 0, + "reqOther": 0, + "bytesOther": 0, + "reqAudio": 0, + "bytesAudio": 0, + "reqVideo": 0, + "bytesVideo": 0, + "reqText": 0, + "bytesText": 0, + "reqXml": 0, + "bytesXml": 0, + "reqGif": 0, + "bytesGif": 0, + "reqJpg": 0, + "bytesJpg": 0, + "reqPng": 0, + "bytesPng": 0, + "reqWebp": 0, + "bytesWebp": 0, + "reqSvg": 0, + "bytesSvg": 0, + "reqFlash": 0, + "bytesFlash": 0, + "numDomains": 1, + "maxageNull": 0, + "maxage0": 0, + "maxage1": 0, + "maxage30": 2, + "maxage365": 0, + "maxageMore": 0, + "bytesHtmlDoc": 648, + "numRedirects": 0, + "numErrors": 1, + "numGlibs": 0, + "numHttps": 2, + "numCompressed": 2, + "maxDomainReqs": 1, + "wptid": "240709_Dx1UR_EB6N1", + "wptrun": 1, + "rank": 500000 +} +``` + +## Schema + +### `metadata` + +Additional metadata about the page. Object is similar to the [`metadata`](/reference/blobs/page-metadata/) field in the `pages` table. + +- #### `metadata.rank` + + The rank magnitude of the origin, which is a measure of relative popularity. + +- #### `metadata.page_id` + + A unique identifier for the page. + +- #### `metadata.tested_url` + + The actual URL of the page that was intended to be tested. + +- #### `metadata.layout` + + Whether the page was tested in a desktop or mobile environment. Values are `"Desktop"` or `"Mobile"`. + +- #### `metadata.crawl_depth` + + Levels of depth from the root page. HTTP Archive is currently configured to crawl one level into a website, so this value will always be `0` or `1`. + +- #### `metadata.link_depth` + + At a given crawl depth, this value represents the index in the list of pages being tested. Currently, HTTP Archive only crawls one page per level, so this value is always `0`. + +- #### `metadata.root_page_id` + + ID of the root page. At crawl depth `0` this is the same as the `page_id` field. + +- #### `metadata.root_page_url` + + URL of the root page. + +- #### `metadata.root_page_test_id` + + WebPageTest ID of the root page. + +- #### `metadata.retry_count` + + The number of retries. + +- #### `metadata.parent_page_id` + + ID of the parent page. + +- #### `metadata.parent_page_url` + + URL of the parent page. + +- #### `metadata.parent_page_test_id` + + WebPageTest ID of the parent page. + +- #### `metadata.visited` + + A list of URLs visited during the crawl. + +### `pageid` + +The unique identifier for the page (same as `page_id` in metadata). + +### `createDate` + +The creation date of the record in Unix timestamp format. + +### `startedDateTime` + +The start date and time of the test in Unix timestamp format. + +### `archive` + +The archive category (e.g., "All"). + +### `label` + +A label for the test date. + +### `crawlid` + +The crawl identifier. + +### `url` + +The URL of the tested page. + +### `urlhash` + +A hash value of the URL. + +### `urlShort` + +A shortened version of the URL. + +### `TTFB` + +Time to First Byte in milliseconds. + +### `renderStart` + +The time when rendering started in milliseconds. + +### `fullyLoaded` + +The time when the page was fully loaded in milliseconds. + +### `visualComplete` + +The time when visual loading was complete in milliseconds. + +### `onLoad` + +The time when the onLoad event was triggered in milliseconds. + +### `gzipTotal` + +Total size of compressed data in bytes. + +### `gzipSavings` + +Bytes saved due to gzip compression. + +### `numDomElements` + +The number of DOM elements. + +### `onContentLoaded` + +The time when the DOMContentLoaded event was triggered in milliseconds. + +### `cdn` + +The Content Delivery Network used. + +### `SpeedIndex` + +The Speed Index score. + +### `PageSpeed` + +The PageSpeed score. + +### `_connections` + +The number of connections made. + +### `_adult_site` + +Boolean indicating if the site is an adult site. + +### `avg_dom_depth` + +Average DOM depth. + +### `doctype` + +The document type. + +### `document_height` + +The height of the document. + +### `document_width` + +The width of the document. + +### `localstorage_size` + +The size of local storage used. + +### `sessionstorage_size` + +The size of session storage used. + +### `meta_viewport` + +The meta viewport tag content. + +### `num_iframes` + +The number of iframes. + +### `num_scripts` + +The number of script tags. + +### `num_scripts_sync` + +The number of synchronous scripts. + +### `num_scripts_async` + +The number of asynchronous scripts. + +### `usertiming` + +User timing data. + +### `reqTotal` + +Total number of requests made. + +### `bytesTotal` + +Total bytes transferred. + +### `reqJS` + +Number of JavaScript requests. + +### `bytesJS` + +Bytes transferred for JavaScript. + +### `reqImg` + +Number of image requests. + +### `bytesImg` + +Bytes transferred for images. + +### `reqJson` + +Number of JSON requests. + +### `bytesJson` + +Bytes transferred for JSON. + +### `reqCss` + +Number of CSS requests. + +### `bytesCss` + +Bytes transferred for CSS. + +### `reqHtml` + +Number of HTML requests. + +### `bytesHtml` + +Bytes transferred for HTML. + +### `reqFont` + +Number of font requests. + +### `bytesFont` + +Bytes transferred for fonts. + +### `reqOther` + +Number of other types of requests. + +### `bytesOther` + +Bytes transferred for other types of requests. + +### `reqAudio` + +Number of audio requests. + +### `bytesAudio` + +Bytes transferred for audio. + +### `reqVideo` + +Number of video requests. + +### `bytesVideo` + +Bytes transferred for video. + +### `reqText` + +Number of text requests. + +### `bytesText` + +Bytes transferred for text. + +### `reqXml` + +Number of XML requests. + +### `bytesXml` + +Bytes transferred for XML. + +### `reqGif` + +Number of GIF requests. + +### `bytesGif` + +Bytes transferred for GIFs. + +### `reqJpg` + +Number of JPG requests. + +### `bytesJpg` + +Bytes transferred for JPGs. + +### `reqPng` + +Number of PNG requests. + +### `bytesPng` + +Bytes transferred for PNGs. + +### `reqWebp` + +Number of WebP requests. + +### `bytesWebp` + +Bytes transferred for WebP. + +### `reqSvg` + +Number of SVG requests. + +### `bytesSvg` + +Bytes transferred for SVG. + +### `reqFlash` + +Number of Flash requests. + +### `bytesFlash` + +Bytes transferred for Flash. + +### `numDomains` + +Number of unique domains. + +### `maxageNull` + +Number of resources with no max-age set. + +### `maxage0` + +Number of resources with max-age 0. + +### `maxage1` + +Number of resources with max-age 1 day. + +### `maxage30` + +Number of resources with max-age 30 days. + +### `maxage365` + +Number of resources with max-age 365 days. + +### `maxageMore` + +Number of resources with max-age more than 365 days. + +### `bytesHtmlDoc` + +Bytes transferred for the HTML document. + +### `numRedirects` + +Number of redirects. + +### `numErrors` + +Number of errors. + +### `numGlibs` + +Number of GLib issues. + +### `numHttps` + +Number of HTTPS requests. + +### `numCompressed` + +Number of compressed requests. + +### `maxDomainReqs` + +Maximum number of requests to a single domain. + +### `wptid` + +WebPageTest ID. + +### `wptrun` + +WebPageTest run number. + +### `rank` + +The rank magnitude of the origin, which is a measure of relative popularity. diff --git a/src/content/docs/reference/structs/request-summary.md b/src/content/docs/reference/structs/request-summary.md new file mode 100644 index 0000000..0bea6aa --- /dev/null +++ b/src/content/docs/reference/structs/request-summary.md @@ -0,0 +1,243 @@ +--- +title: Request summary struct +description: Reference docs for the request summary struct +--- + +_Appears in: [`requests` table](/reference/tables/requests/)_\ +_As: `summary`_ + +JSON-encoded summarization of request data. + +Here's an example of the decoded object: + +```json +{ + "requestid": 103235745187102721, + "pageid": 24036445, + "crawlid": 0, + "startedDateTime": 1720729879, + "time": 260, + "_cdn_provider": "Edgecast", + "_gzip_save": 0, + "method": "GET", + "url": "https://www.example.com/", + "urlShort": "https://www.example.com/", + "reqHeadersSize": 683, + "reqBodySize": null, + "reqOtherHeaders": "priority = u=0, i, sec-ch-ua = \" Not A;Brand\";v=\"99\", \"Chromium\";v=\"126\", \"Google Chrome\";v=\"126\", sec-ch-ua-mobile = ?0, sec-ch-ua-platform = \"Unknown\", sec-fetch-dest = document, sec-fetch-mode = navigate, sec-fetch-site = cross-site, upgrade-insecure-requests = 1", + "reqCookieLen": 0, + "status": 200, + "respHttpVersion": "HTTP/2", + "redirectUrl": null, + "respHeadersSize": 376, + "respBodySize": 648, + "respSize": 648, + "mimeType": "text/html", + "ext": "", + "type": "html", + "format": "", + "respOtherHeaders": "x-cache = HIT", + "respCookieLen": 0, + "expAge": 604800, + "req_accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", + "req_accept_encoding": "gzip, deflate, br, zstd", + "req_accept_language": "en-US,en;q=0.9", + "req_user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36 PTST/240709.152506", + "resp_accept_ranges": "bytes", + "resp_age": "238899", + "resp_cache_control": "max-age=604800", + "resp_content_encoding": "gzip", + "resp_content_length": "648", + "resp_content_type": "text/html; charset=UTF-8", + "resp_date": "Thu, 11 Jul 2024 20:31:19 GMT", + "resp_etag": "\"3147526947+gzip\"", + "resp_expires": "Thu, 18 Jul 2024 20:31:19 GMT", + "resp_last_modified": "Thu, 17 Oct 2019 07:18:26 GMT", + "resp_server": "ECAcc (dna/62BC)", + "resp_vary": "Accept-Encoding", + "firstReq": true, + "firstHtml": true +} +``` + +## Schema + +### `requestid` + +The unique identifier for the request. + +### `pageid` + +The unique identifier for the page associated with the request. + +### `crawlid` + +The crawl identifier. + +### `startedDateTime` + +The start date and time of the request in Unix timestamp format. + +### `time` + +The total time taken for the request in milliseconds. + +### `_cdn_provider` + +The Content Delivery Network provider. + +### `_gzip_save` + +Bytes saved due to gzip compression. + +### `method` + +The HTTP method used for the request (e.g., "GET", "POST"). + +### `url` + +The URL of the requested resource. + +### `urlShort` + +A shortened version of the URL. + +### `reqHeadersSize` + +The size of the request headers in bytes. + +### `reqBodySize` + +The size of the request body in bytes. + +### `reqOtherHeaders` + +Additional headers sent with the request. + +### `reqCookieLen` + +The length of the request cookies. + +### `status` + +The HTTP status code of the response. + +### `respHttpVersion` + +The HTTP version used in the response. + +### `redirectUrl` + +The URL to which the request was redirected. + +### `respHeadersSize` + +The size of the response headers in bytes. + +### `respBodySize` + +The size of the response body in bytes. + +### `respSize` + +The total size of the response in bytes. + +### `mimeType` + +The MIME type of the response. + +### `ext` + +The file extension of the requested resource. + +### `type` + +The type of the requested resource (e.g., "audio", "image"). + +### `format` + +The format of the resource. + +### `respOtherHeaders` + +Additional headers received in the response. + +### `respCookieLen` + +The length of the response cookies. + +### `expAge` + +The age of the cached response in seconds. + +### `req_accept` + +The value of the `Accept` header sent with the request. + +### `req_accept_encoding` + +The value of the `Accept-Encoding` header sent with the request. + +### `req_accept_language` + +The value of the `Accept-Language` header sent with the request. + +### `req_if_modified_since` + +The value of the `If-Modified-Since` header sent with the request. + +### `req_if_none_match` + +The value of the `If-None-Match` header sent with the request. + +### `req_referer` + +The value of the `Referer` header sent with the request. + +### `req_user_agent` + +The value of the `User-Agent` header sent with the request. + +### `resp_age` + +The `Age` header value in the response. + +### `resp_cache_control` + +The `Cache-Control` header value in the response. + +### `resp_date` + +The `Date` header value in the response. + +### `resp_etag` + +The `ETag` header value in the response. + +### `resp_last_modified` + +The `Last-Modified` header value in the response. + +### `resp_server` + +The `Server` header value in the response. + +### `resp_vary` + +The `Vary` header value in the response. + +### `resp_content_length` + +The `Content-Length` header value in the response. + +### `resp_content_type` + +The `Content-Type` header value in the response. + +### `firstReq` + +Boolean indicating if this is the first request of the page. + +### `firstHtml` + +Boolean indicating if this is the first HTML request of the page. diff --git a/src/content/docs/reference/tables/pages.mdx b/src/content/docs/reference/tables/pages.mdx index b33e690..65a8cde 100644 --- a/src/content/docs/reference/tables/pages.mdx +++ b/src/content/docs/reference/tables/pages.mdx @@ -197,6 +197,8 @@ See the [`har`](/reference/blobs/har/) reference for more details. JSON-encoded summarization of the page-level data +See the [`summary`](/reference/structs/page-summary/) reference for more details. + ### `custom_metrics` JSON-encoded test results of the custom metrics. @@ -223,4 +225,4 @@ See the [`technology`](/reference/structs/technology/) reference for more detail Additional metadata about the test -See the [`metadata`](/reference/blobs/page-metdata/) reference for more details. +See the [`metadata`](/reference/blobs/page-metadata/) reference for more details. diff --git a/src/content/docs/reference/tables/requests.mdx b/src/content/docs/reference/tables/requests.mdx index 1741e1c..9e58c4f 100644 --- a/src/content/docs/reference/tables/requests.mdx +++ b/src/content/docs/reference/tables/requests.mdx @@ -226,14 +226,21 @@ JSON-encoded WebPageTest result data for this request JSON-encoded summarization of request data +See the [`summary`](/reference/structs/request-summary/) reference for more details. + ### `request_headers` Request headers +See the [Header](/reference/structs/header/) reference for more details. + ### `response_headers` Response headers +See the [Header](/reference/structs/header/) reference for more details. + ### `response_body` Text-based response body +