-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
85 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
--- | ||
title: Page metadata blob | ||
description: Reference docs for the HAR page metadata blob | ||
--- | ||
|
||
_Appears in: [`pages` table](/reference/tables/pages/)_\ | ||
_As: `metadata`_ | ||
|
||
JSON-encoded HTTP Archive metadata about the page that was tested. | ||
|
||
Here's an example of the decoded object: | ||
|
||
```json | ||
{ | ||
"rank": 500000, | ||
"page_id": 26243429, | ||
"tested_url": "https://www.example.com/", | ||
"layout": "Desktop", | ||
"crawl_depth": 0, | ||
"link_depth": 0, | ||
"root_page_id": 26243429, | ||
"root_page_url": "https://www.example.com/", | ||
"root_page_test_id": "230509_Dx20W_FMHK5" | ||
} | ||
``` | ||
|
||
## Schema | ||
|
||
### `rank` | ||
|
||
The rank magnitude of the origin, which is a measure of relative popularity. | ||
|
||
For example, the page `https://www.example.com/` has a rank of `500000`, which means that the `https://www.example.com` website is in the top 500k most popular, according to the [Chrome UX Report](https://developer.chrome.com/docs/crux/methodology/#popularity-metric). | ||
|
||
### `page_id` | ||
|
||
A unique identifier for the page. | ||
|
||
This is used for internal deduplication purposes and doesn't reflect anything about the page itself. | ||
|
||
### `tested_url` | ||
|
||
The actual URL of the page that was intended to be tested. | ||
|
||
This is useful when there may be ambiguity caused by redirects or known issues where the certificate request appears before the page itself. | ||
|
||
### `layout` | ||
|
||
Whether the page was tested in a desktop or mobile environment. Values are `"Desktop"` or `"Mobile"`. | ||
|
||
### `crawl_depth` | ||
|
||
Levels of depth from the root page. HTTP Archive is currently configured to crawl one level into a website, so this value will always be `0` or `1`. | ||
|
||
A value of `0` means that the page is the root page. A value of `1` means that the page is an _interior_ or _secondary page_. | ||
|
||
When a root page is being tested, secondary page candidates are collected using the [`crawl-links`](https://github.com/HTTPArchive/custom-metrics/blob/main/dist/crawl_links.js) custom metric. The criteria for candidate pages are: | ||
|
||
- The page is linked from the root page | ||
- The page is on the same origin as the root page | ||
- THe page is not the same as the root page | ||
- The link to the page is visible within the viewport | ||
|
||
From the list of candidates, the link with the largest hit area is selected to be tested next. If that test fails, the next largest link is used. | ||
|
||
### `link_depth` | ||
|
||
At a given crawl depth, this value represents the index in the list of pages being tested. Currently, HTTP Archive only crawls one page per level, so this value is always `0`. | ||
|
||
Hypothetically, HTTP Archive can crawl multiple pages per level. For example, at crawl depth `0`, the page is the root. Given that there's only one root page, the `link_depth` would be `0`. At crawl depth `1`, there may be many secondary page candidates. Instead of testing only one of them, HTTP Archive could test multiple secondary pages that are all linked from the root page. These pages would have a `link_depth` of `0`, `1`, `2`, etc, where the smaller indexes represent the pages that are more prominently linked from the preceding page. | ||
|
||
### `root_page_id` | ||
|
||
ID of the root page. At crawl depth `0` this is the same as the `page_id` field. | ||
|
||
### `root_page_url` | ||
|
||
URL of the root page. | ||
|
||
### `root_page_test_id` | ||
|
||
WebPageTest ID of the root page. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters