Skip to content

Commit

Permalink
reorder the tabs
Browse files Browse the repository at this point in the history
  • Loading branch information
max-ostapenko committed Nov 20, 2024
1 parent d892aff commit 23031ad
Showing 1 changed file with 96 additions and 98 deletions.
194 changes: 96 additions & 98 deletions src/content/docs/guides/migrating-to-crawl-dataset.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,25 @@ not available | [`metadata`](/reference/tables/pages/#metadata)
- Migrate custom metrics

<Tabs>
<TabItem label="crawl.pages">
```sql
/* This query will process 115 GB when run. */
SELECT
client,
STRING(custom_metrics.performance.lcp_elem_stats.nodeName) AS lcp_elem_node_name,
AVG(INT64(custom_metrics.performance.lcp_elem_stats.size)) AS lcp_elem_node_size
FROM `httparchive.crawl.pages`
WHERE
date = '2024-10-01' AND
is_root_page
GROUP BY
client,
lcp_elem_node_name
ORDER BY
client,
lcp_elem_node_size DESC
```
</TabItem>
<TabItem label="pages.YYYY_MM_DD_client">
```sql
/* This query will process 6.44 TB when run. */
Expand Down Expand Up @@ -68,30 +87,30 @@ not available | [`metadata`](/reference/tables/pages/#metadata)
lcp_elem_node_size DESC
```
</TabItem>
</Tabs>

- Migrate summary metrics queries

<Tabs>
<TabItem label="crawl.pages">
```sql
/* This query will process 115 GB when run. */
/* This query will process 34 GB when run. */
SELECT
client,
STRING(custom_metrics.performance.lcp_elem_stats.nodeName) AS lcp_elem_node_name,
AVG(INT64(custom_metrics.performance.lcp_elem_stats.size)) AS lcp_elem_node_size
INT64(summary.numDomains) AS numDomains,
COUNT(0) pages,
AVG(INT64(summary.reqTotal)) AS avg_requests
FROM `httparchive.crawl.pages`
WHERE
date = '2024-10-01' AND
is_root_page
GROUP BY
client,
lcp_elem_node_name
ORDER BY
client,
lcp_elem_node_size DESC
numDomains
HAVING pages > 1000
ORDER BY numDomains ASC
```
</TabItem>
</Tabs>

- Migrate summary metrics queries

<Tabs>
<TabItem label="summary_pages.YYYY_MM_DD_client">
```sql
/* This query will process 440 MB when run. */
Expand Down Expand Up @@ -127,30 +146,27 @@ not available | [`metadata`](/reference/tables/pages/#metadata)
ORDER BY numDomains ASC
```
</TabItem>
</Tabs>

- Migrate detected technologies metrics

<Tabs>
<TabItem label="crawl.pages">
```sql
/* This query will process 34 GB when run. */
/* This query will process 7.18 GB when run. */
SELECT
client,
INT64(summary.numDomains) AS numDomains,
COUNT(0) pages,
AVG(INT64(summary.reqTotal)) AS avg_requests
FROM `httparchive.crawl.pages`
page,
technologies.categories,
technologies.technology,
technologies.info
FROM `httparchive.crawl.pages`,
UNNEST (technologies) AS technologies
WHERE
date = '2024-10-01' AND
client = 'desktop' AND
is_root_page
GROUP BY
client,
numDomains
HAVING pages > 1000
ORDER BY numDomains ASC
```
</TabItem>
</Tabs>

- Migrate detected technologies metrics

<Tabs>
<TabItem label="technologies.YYYY_MM_DD_client">
```sql
/* This query will process 14 GB when run. */
Expand All @@ -162,7 +178,6 @@ not available | [`metadata`](/reference/tables/pages/#metadata)
FROM `httparchive.technologies.2024_10_01_desktop`
```
</TabItem>

<TabItem label="all.pages">
```sql
/* This query will process 7.18 GB when run. */
Expand All @@ -179,28 +194,24 @@ not available | [`metadata`](/reference/tables/pages/#metadata)
is_root_page
```
</TabItem>
</Tabs>

- Migrate lighthouse insights

<Tabs>
<TabItem label="crawl.pages">
```sql
/* This query will process 7.18 GB when run. */
/* This query will process 4.2 TB when run. */
SELECT
page,
technologies.categories,
technologies.technology,
technologies.info
FROM `httparchive.crawl.pages`,
UNNEST (technologies) AS technologies
lighthouse.audits.`largest-contentful-paint`.numericValue AS LCP
FROM `httparchive.crawl.pages`
WHERE
date = '2024-10-01' AND
client = 'desktop' AND
is_root_page
```
</TabItem>
</Tabs>

- Migrate lighthouse insights

<Tabs>
<TabItem label="lighthouse.YYYY_MM_DD_client">
```sql
/* This query will process 4.23 TB when run. */
Expand All @@ -223,24 +234,27 @@ not available | [`metadata`](/reference/tables/pages/#metadata)
is_root_page
```
</TabItem>
</Tabs>

- Migrate Blink features metrics

<Tabs>
<TabItem label="crawl.pages">
```sql
/* This query will process 4.2 TB when run. */
/* This query will process 114 GB when run. */
SELECT
page,
lighthouse.audits.`largest-contentful-paint`.numericValue AS LCP
FROM `httparchive.crawl.pages`
features.feature,
features.type,
features.id
FROM `httparchive.crawl.pages`,
UNNEST (features) AS features
WHERE
date = '2024-10-01' AND
client = 'desktop' AND
is_root_page
```
</TabItem>
</Tabs>

- Migrate Blink features metrics

<Tabs>
<TabItem label="blink_features.features">
```sql
/* This query will process 548 GB when run. */
Expand Down Expand Up @@ -271,22 +285,6 @@ not available | [`metadata`](/reference/tables/pages/#metadata)
is_root_page
```
</TabItem>
<TabItem label="crawl.pages">
```sql
/* This query will process 114 GB when run. */
SELECT
page,
features.feature,
features.type,
features.id
FROM `httparchive.crawl.pages`,
UNNEST (features) AS features
WHERE
date = '2024-10-01' AND
client = 'desktop' AND
is_root_page
```
</TabItem>
</Tabs>

## Migrating to `crawl.requests`
Expand Down Expand Up @@ -315,6 +313,21 @@ not available | [`root_page`](/reference/tables/requests/#root_page)
- Migrate headers metrics

<Tabs>
<TabItem label="crawl.requests">
```sql
/* This query will process 169 GB when run. */
SELECT
response_header.value AS header_value,
FROM `httparchive.crawl.requests`,
UNNEST(response_headers) AS response_header
WHERE
date = '2024-10-01' AND
client = 'desktop' AND
is_main_document AND
is_root_page AND
LOWER(response_header.name) = 'content-type'
```
</TabItem>
<TabItem label="summary_requests.YYYY_MM_DD_client">
```sql
/* This query will process 22.5 GB when run. */
Expand All @@ -341,26 +354,26 @@ not available | [`root_page`](/reference/tables/requests/#root_page)
LOWER(response_header.name) = 'content-type'
```
</TabItem>
</Tabs>

- Migrate summary metrics

<Tabs>
<TabItem label="crawl.requests">
```sql
/* This query will process 169 GB when run. */
/* This query will process 376 GB when run. */
SELECT
response_header.value AS header_value,
FROM `httparchive.crawl.requests`,
UNNEST(response_headers) AS response_header
page,
url,
STRING(summary.mimeType) AS mimeType,
INT64(summary.respBodySize) AS respBodySize,
FROM `httparchive.crawl.requests`
WHERE
date = '2024-10-01' AND
client = 'desktop' AND
is_main_document AND
is_root_page AND
LOWER(response_header.name) = 'content-type'
is_root_page
```
</TabItem>
</Tabs>

- Migrate summary metrics

<Tabs>
<TabItem label="summary_requests.YYYY_MM_DD_client">
```sql
/* This query will process 193 GB when run. */
Expand Down Expand Up @@ -398,26 +411,25 @@ not available | [`root_page`](/reference/tables/requests/#root_page)
is_root_page
```
</TabItem>
</Tabs>

- Migrate response body queries

<Tabs>
<TabItem label="crawl.requests">
```sql
/* This query will process 376 GB when run. */
/* This query will process 42.8 TB when run. */
SELECT
page,
url,
STRING(summary.mimeType) AS mimeType,
INT64(summary.respBodySize) AS respBodySize,
BYTE_LENGTH(response_body) AS bodySize
FROM `httparchive.crawl.requests`
WHERE
date = '2024-10-01' AND
client = 'desktop' AND
is_root_page
```
</TabItem>
</Tabs>

- Migrate response body queries

<Tabs>
<TabItem label="response_bodies.YYYY_MM_DD_client">
```sql
/* This query will process 40.7 TB when run. */
Expand All @@ -428,18 +440,4 @@ not available | [`root_page`](/reference/tables/requests/#root_page)
FROM `httparchive.response_bodies.2024_06_01_desktop`
```
</TabItem>
<TabItem label="crawl.requests">
```sql
/* This query will process 42.8 TB when run. */
SELECT
page,
url,
BYTE_LENGTH(response_body) AS bodySize
FROM `httparchive.crawl.requests`
WHERE
date = '2024-10-01' AND
client = 'desktop' AND
is_root_page
```
</TabItem>
</Tabs>

0 comments on commit 23031ad

Please sign in to comment.