Skip to content

Commit

Permalink
docs
Browse files Browse the repository at this point in the history
  • Loading branch information
rviscomi committed Jun 6, 2023
1 parent aebad5f commit a691596
Show file tree
Hide file tree
Showing 9 changed files with 307 additions and 96 deletions.
12 changes: 10 additions & 2 deletions astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,16 @@ export default defineConfig({
],
},
{
label: 'Reference',
autogenerate: { directory: 'reference' },
label: 'Tables',
autogenerate: { directory: 'reference/tables' }
},
{
label: 'Structs',
autogenerate: { directory: 'reference/structs' }
},
{
label: 'Blobs',
autogenerate: { directory: 'reference/blobs' }
},
],
}),
Expand Down
Binary file removed src/assets/houston.webp
Binary file not shown.
10 changes: 10 additions & 0 deletions src/content/docs/reference/blobs/lighthouse.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: Lighthouse blob
description: Reference docs for the Lighthouse blob
---

_Appears in: [`pages` table](/reference/tables/pages/)_

JSON-encoded blob of Lighthouse data for the page.

**The actual schema of the Lighthouse object is liable to change depending on the page and Lighthouse version.**
93 changes: 0 additions & 93 deletions src/content/docs/reference/pages.mdx

This file was deleted.

28 changes: 28 additions & 0 deletions src/content/docs/reference/structs/feature.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: Feature struct
description: Reference docs for the feature struct
---

_Appears in: [`pages` table](/reference/tables/pages/)_

Each header is a key-value pair corresponding to an HTTP header sent from or to the client: request and response headers, respectively.

## Schema

Field name | Type | Description
---|---|---
`feature` | `STRING` | Blink feature name
`id` | `STRING` | Blink feature ID
`type` | `STRING` | Blink feature type (css, default)

### `feature`

Blink feature name

### `id`

Blink feature ID

### `type`

Blink feature type (css, default)
23 changes: 23 additions & 0 deletions src/content/docs/reference/structs/header.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: Header struct
description: Reference docs for the header struct
---

_Appears in: [`requests` table](/reference/tables/requests/)_

Each header is a key-value pair corresponding to an HTTP header sent from or to the client: request and response headers, respectively.

## Schema

Field name | Type | Description
---|---|---
`name` | string | Header name
`value` | string | Header value

### `name`

Header name

### `value`

Header value
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Technology struct
description: Reference docs for the technology struct
---

_Appears in: [`pages` table](/reference/pages)_
_Appears in: [`pages` table](/reference/tables/pages/)_

Technologies are detected by [Wappalyzer](https://www.wappalyzer.com/). Refer to the [Wappalyzer repository](https://github.com/wappalyzer/wappalyzer) on GitHub to request a new technology detection or to browse the source code of existing detections.

Expand Down
146 changes: 146 additions & 0 deletions src/content/docs/reference/tables/pages.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
title: Pages table
description: Reference docs for the httparchive.all.pages table
---

import { Tabs, TabItem } from '@astrojs/starlight/components';

[`httparchive.all.pages`](https://console.cloud.google.com/bigquery?ws=!1m5!1m4!4m3!1shttparchive!2sall!3spages) is a partitioned and clustered table containing one row per page tested in the HTTP Archive. Pages are tested on a monthly basis and as of April 2022, both the root page and one secondary page are tested.

## Example queries

Here are some common operations you can perform with the `pages` table.

### Get the median page weight

<Tabs>
<TabItem label="Query">

```sql
/* This query will process 1.12 GB when run. */
WITH pages AS (
SELECT
client,
CAST(JSON_VALUE(summary, '$.bytesTotal') AS INT64) AS page_weight
FROM
`httparchive.all.pages` TABLESAMPLE SYSTEM (1 PERCENT)
WHERE
date = '2023-05-01'
)

SELECT
client,
APPROX_QUANTILES(page_weight, 1000)[OFFSET(500)] AS median_page_weight
FROM
pages
GROUP BY
client
```

</TabItem>
<TabItem label="Results">

client | median_page_weight
-- | --
mobile | 1776291
desktop | 2029751

The median mobile page weighs 1.78 MB and the median desktop page weighs 2.03 MB.

</TabItem>
</Tabs>

This query uses the [`APPROX_QUANTILES`](https://cloud.google.com/bigquery/docs/reference/standard-sql/approximate_aggregate_functions#approx_quantiles) function to calculate the median page weight for each client type as of May 2023.

The `bytesTotal` property of the `summary` object represents the total number of bytes loaded on the page. This value is stored as a JSON-encoded string, so we use [`JSON_VALUE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#json_value) to extract it and [`CAST`](https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_functions#cast) to convert it to an integer.

We're also using the [`WITH`](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#with_clause) clause here to create a temporary table called `pages`, which is then fed into the main query below. This makes the query a bit easier to read.

Also note that for demonstration purposes, this query processes a 1% sample of the `httparchive.all.pages` table. This reduces the amount of data processed by the query, which can help reduce costs. But note that the results will be less accurate than if you ran the query on the full table.

## Schema

Field name | Type | Description
---|---|---
[`date`](#date) | `DATE` | YYYY-MM-DD format of the HTTP Archive monthly crawl
[`client`](#client) | `STRING` | Test environment: `'desktop'` or `'mobile'`
[`page`](#page) | `STRING` | The URL of the page being tested
[`is_root_page`](#is_root_page) | `BOOLEAN` | Whether the page is the root of the origin
[`root_page`](#root_page) | `STRING` | The URL of the root page being tested, the origin followed by `/`
[`rank`](#rank) | `INTEGER` | Site popularity rank, from CrUX
[`wptid`](#wptid) | `STRING` | ID of the WebPageTest results
[`payload`](#payload) | `STRING` | JSON-encoded WebPageTest results for the page
[`summary`](#summary) | `STRING` | JSON-encoded summarization of the page-level data
[`custom_metrics`](#custom_metrics) | `STRING` | JSON-encoded test results of the custom metrics
[`lighthouse`](#lighthouse) | [`Lighthouse`](/reference/blobs/lighthouse/) | JSON-encoded Lighthouse report
[`features`](#features) | <code>ARRAY&lt;<a href="/reference/structs/feature/">Feature</a>></code> | Blink features detected at runtime
[`technologies`](#technologies) | <code>ARRAY&lt;<a href="/reference/structs/technology/">Technology</a>></code> | Technologies detected at runtime
[`metadata`](#metadata) | `STRING` | Additional metadata about the test

### `date`

**This field is required for all queries over the `pages` table.**

YYYY-MM-DD format of the HTTP Archive monthly crawl.

Example: `date = '2023-06-01'`

### `client`

Test environment: `'desktop'` or `'mobile'`.

### `page`

The URL of the page being tested.

Example: `page = 'https://har.fyi/'`

### `is_root_page`

Whether the page is the root of the origin.

### `root_page`

The URL of the root page being tested, the origin followed by `/`.

Example: `root_page = 'https://har.fyi/'`

### `rank`

Site popularity rank, from CrUX

### `wptid`

ID of the WebPageTest results

### `payload`

JSON-encoded WebPageTest results for the page

### `summary`

JSON-encoded summarization of the page-level data

### `custom_metrics`

JSON-encoded test results of the custom metrics

### `lighthouse`

JSON-encoded Lighthouse report.

See the [`lighthouse`](/reference/blobs/lighthouse/) reference for more details.

### `features`

Blink features detected at runtime (see https://chromestatus.com/features)

### `technologies`

Technologies detected at runtime (see https://www.wappalyzer.com/)

See the [`technology`](/reference/structs/technology/) reference for more details.

### `metadata`

Additional metadata about the test
Loading

0 comments on commit a691596

Please sign in to comment.