Skip to content

Commit 02ee72f

Browse files
authored
add topics attribute to search (github#18212)
1 parent 17f09e0 commit 02ee72f

15 files changed

+174
-24
lines changed

content/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -228,9 +228,9 @@ includeGuides:
228228
- Optional.
229229

230230
### `topics`
231-
- Purpose: Indicate the topics covered by the article.
232-
- Type: `String`
233-
- Optional.
231+
- Purpose: Indicate the topics covered by the article. The topics are used to filter guides on some landing pages. For example, the guides at the bottom of [this page](https://docs.github.com/en/actions/guides) can be filtered by topics and the topics are listed under the guide intro. Topics are also added to all search records that get created for each page. The search records contain a `topics` property that is used to filter search results by topics. For more information, see the [Search](/contributing/search.md) contributing guide. Refer to the content models for more details around adding topics. A full list of existing topics is located in the [allowed topics file](/data/allowed-topics.js). If topics in article frontmatter and the allow-topics list become out of sync, the [topics CI test](/tests/unit/search/topics.js) will fail.
232+
- Type: Array of `String`s
233+
- Optional: Topics are preferred for each article, but, there may be cases where existing articles don't yet have topics or a adding a topic to a new article may not add value.
234234

235235
### `contributor`
236236
- Purpose: Indicate an article is contributed and maintained by a third-party organization, typically a GitHub Technology Partner.

contributing/search.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,10 @@ To see all existing search-related issues and pull requests, visit [github.com/g
1515
## How to search
1616

1717
The site search is part of every version of docs.github.com. On any page, you can use the search box to search the documents we've indexed.
18-
You can also query our search endpoint directly at: https://docs.github.com/search?language=en&version=dotcom&query=jekyll
18+
You can also query our search endpoint directly at: https://docs.github.com/search?language=en&version=dotcom&query=jekyll+topics:actions
19+
20+
Using the attribute `topics` in your query will only return results that have the matching topic value. You can find a full list of topics in [the allowed topics file](/data/allowed-topics.js). The `topics` attribute is configured as a [`filter only` facet in Algolia](https://www.algolia.com/doc/guides/managing-results/refine-results/filtering/).
21+
1922
This endpoint responds in JSON format, and fronts Algolia and Lunr. We recommend using this endpoint over directly integrating with Algolia or Lunr, as the endpoint will be more stable.
2023

2124
## Production deploys

data/allowed-topics.js

+84
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
// This is an AllowList of topics that are approved for use in `topics`
2+
// frontmatter property. If a new topic is added to a Markdown file it must
3+
// also be added to this file.
4+
5+
// The purpose of this list is to ensure we prevent typos and put a process in
6+
// place to keep a curated list of topics. This list also serves as a list of
7+
// available topics filters when using the search endpoint
8+
// (see /contributing/search#how-to-search)
9+
// If you'd like to add a new topic, consult the topic guidelines in the
10+
// content model, add the entry to this list, and ensure you loop in the
11+
// content and/or content strategy team for review.
12+
13+
module.exports = [
14+
'2fa',
15+
'Action development',
16+
'Amazon ECS',
17+
'Ant',
18+
'Azure App Service',
19+
'Azure Pipelines',
20+
'CD',
21+
'CI',
22+
'CircleCI',
23+
'Containers',
24+
'Docker',
25+
'Fundamentals',
26+
'GitLab',
27+
'Google Kubernetes Engine',
28+
'Gradle',
29+
'Java',
30+
'JavaScript',
31+
'Jenkins',
32+
'Maven',
33+
'Migration',
34+
'Node',
35+
'Packaging',
36+
'Powershell',
37+
'Project management',
38+
'Publishing',
39+
'Python',
40+
'Ruby',
41+
'Security',
42+
'Travis CI',
43+
'Workflows',
44+
'access management',
45+
'accounts',
46+
'api',
47+
'billing',
48+
'cli',
49+
'codespaces',
50+
'community',
51+
'desktop',
52+
'device verification',
53+
'early access',
54+
'enterprise',
55+
'events',
56+
'github',
57+
'github apps',
58+
'github search',
59+
'identity',
60+
'issues',
61+
'jobs',
62+
'legal',
63+
'marketplace',
64+
'mobile',
65+
'notifications',
66+
'oauth apps',
67+
'open source',
68+
'organizations',
69+
'pages',
70+
'permissions',
71+
'policy',
72+
'profile',
73+
'profiles',
74+
'projects',
75+
'pull requests',
76+
'repositories',
77+
'security',
78+
'sponsors',
79+
'ssh',
80+
'sso',
81+
'teams',
82+
'usernames',
83+
'webhooks'
84+
]

includes/head.html

+3
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@
1111
{% if page.intro %}
1212
<meta name="description" content="{{ page.introPlainText }}">
1313
{% endif %}
14+
{% if page.topics %}
15+
<meta name="keywords" content="{{ page.topics | join: ',' }}">
16+
{% endif %}
1417
{% if page.hidden %}
1518
<meta name="robots" content="noindex" />
1619
{% endif %}

javascripts/search.js

+2-2
Original file line numberDiff line numberDiff line change
@@ -261,8 +261,8 @@ function tmplSearchResult ({ url, breadcrumbs, heading, title, content }) {
261261
{ href: url, class: 'no-underline' },
262262
div(
263263
{ class: 'search-result-breadcrumbs d-block text-gray-dark opacity-60 text-small pb-1' },
264-
// Remove redundant title from the end of breadcrumbs
265-
markify((breadcrumbs || '').replace(` / ${title}`, ''))
264+
// Breadcrumbs in search records don't include the page title
265+
markify(breadcrumbs || '')
266266
),
267267
div(
268268
{ class: 'search-result-title d-block h4-mktg text-gray-dark' },

lib/search/algolia-search.js

+2-1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ module.exports = async function loadAlgoliaResults ({ version, language, query,
2424
breadcrumbs: get(hit, '_highlightResult.breadcrumbs.value'),
2525
heading: get(hit, '_highlightResult.heading.value'),
2626
title: get(hit, '_highlightResult.title.value'),
27-
content: get(hit, '_highlightResult.content.value')
27+
content: get(hit, '_highlightResult.content.value'),
28+
topics: hit.topics
2829
}))
2930
}

lib/search/lunr-search-index.js

+1
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ module.exports = class LunrIndex {
4646
this.field('heading')
4747
this.field('title')
4848
this.field('content')
49+
this.field('topics')
4950
this.field('customRanking')
5051

5152
this.metadataWhitelist = ['position']

lib/search/lunr-search.js

+4-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,9 @@ module.exports = async function loadLunrResults ({ version, language, query, lim
2525
breadcrumbs: field(result, record, 'breadcrumbs'),
2626
heading: field(result, record, 'heading'),
2727
title: field(result, record, 'title'),
28-
content: field(result, record, 'content')
28+
content: field(result, record, 'content'),
29+
// don't highlight the topics array
30+
topics: record.topics
2931
}
3032
})
3133
return results
@@ -48,6 +50,7 @@ async function loadLunrRecords (indexName) {
4850
.then(JSON.parse)
4951
}
5052

53+
// Highlight a match within an attribute field
5154
function field (result, record, name) {
5255
const text = record[name]
5356
if (!text) return text

lib/search/parse-page-sections-into-records.js

+17-4
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ const { maxContentLength } = require('./config')
1212

1313
module.exports = function parsePageSectionsIntoRecords (href, $) {
1414
const title = $('h1').text().trim()
15-
const breadcrumbs = $('nav.breadcrumbs a')
15+
const breadcrumbsArray = $('nav.breadcrumbs a')
1616
.map((i, el) => {
1717
return $(el)
1818
.text()
@@ -21,7 +21,18 @@ module.exports = function parsePageSectionsIntoRecords (href, $) {
2121
.replace(/\s+/g, ' ')
2222
})
2323
.get()
24-
.join(' / ')
24+
.slice(0, -1)
25+
26+
const breadcrumbs = breadcrumbsArray.join(' / ') || ''
27+
const metaKeywords = $('meta[name="keywords"]').attr('content')
28+
const topics = metaKeywords ? metaKeywords.split(',') : []
29+
30+
const productName = breadcrumbsArray[0] || ''
31+
topics.push(productName)
32+
// Remove "github" to make filter queries shorter
33+
if (productName.includes('GitHub ')) {
34+
topics.push(productName.replace('GitHub ', ''))
35+
}
2536

2637
let records
2738

@@ -54,7 +65,8 @@ module.exports = function parsePageSectionsIntoRecords (href, $) {
5465
breadcrumbs,
5566
heading,
5667
title,
57-
content
68+
content,
69+
topics
5870
}
5971
})
6072
.get()
@@ -74,7 +86,8 @@ module.exports = function parsePageSectionsIntoRecords (href, $) {
7486
url,
7587
breadcrumbs,
7688
title,
77-
content
89+
content,
90+
topics
7891
}]
7992
}
8093

script/content-migrations/add-tags-to-articles.js

+1
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ function updateFrontmatter (filePath, newTopics) {
4848
} else if (Array.isArray(data.topics)) {
4949
topics = topics.concat(data.topics)
5050
}
51+
5152
newTopics.forEach(topic => {
5253
topics.push(topic)
5354
})

tests/unit/algolia/fixtures/page-with-sections.html tests/unit/search/fixtures/page-with-sections.html

+6-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
1+
<head>
2+
<meta name="keywords" content="topic1,topic2">
3+
</head>
14
<div class="article-grid-body">
25
<nav class="breadcrumbs">
3-
<a href="#">a</a>
4-
<a href="#">b</a>
5-
<a href="#">c</a>
6+
<a href="#">GitHub Actions</a>
7+
<a href="#">actions learning path</a>
8+
<a href="#">I am the page title</a>
69
</nav>
710

811
<h1>I am the page title</h1>

tests/unit/algolia/fixtures/page-without-sections.html tests/unit/search/fixtures/page-without-sections.html

+6-3
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
1+
<head>
2+
<meta name="keywords" content="key1,key2,key3">
3+
</head>
14
<p>I am outside the article and should not be included</p>
25

36
<div class="article-grid-body">
47
<nav class="breadcrumbs">
5-
<a href="#">x</a>
6-
<a href="#">y</a>
7-
<a href="#">z</a>
8+
<a href="#">Education</a>
9+
<a href="#">map topic</a>
10+
<a href="#">A page without sections</a>
811
</nav>
912

1013
<h1>A page without sections</h1>

tests/unit/algolia/parse-page-sections-into-records.js tests/unit/search/parse-page-sections-into-records.js

+9-6
Original file line numberDiff line numberDiff line change
@@ -20,19 +20,21 @@ describe('search parsePageSectionsIntoRecords module', () => {
2020
objectID: '/example/href#first',
2121
url: 'https://docs.github.com/example/href#first',
2222
slug: 'first',
23-
breadcrumbs: 'a / b / c',
23+
breadcrumbs: 'GitHub Actions / actions learning path',
2424
heading: 'First heading',
2525
title: 'I am the page title',
26-
content: "Here's a paragraph. And another."
26+
content: "Here's a paragraph. And another.",
27+
topics: ['topic1', 'topic2', 'GitHub Actions', 'Actions']
2728
},
2829
{
2930
objectID: '/example/href#second',
3031
url: 'https://docs.github.com/example/href#second',
3132
slug: 'second',
32-
breadcrumbs: 'a / b / c',
33+
breadcrumbs: 'GitHub Actions / actions learning path',
3334
heading: 'Second heading',
3435
title: 'I am the page title',
35-
content: "Here's a paragraph in the second section. And another."
36+
content: "Here's a paragraph in the second section. And another.",
37+
topics: ['topic1', 'topic2', 'GitHub Actions', 'Actions']
3638
}
3739
]
3840

@@ -50,9 +52,10 @@ describe('search parsePageSectionsIntoRecords module', () => {
5052
{
5153
objectID: '/example/href',
5254
url: 'https://docs.github.com/example/href',
53-
breadcrumbs: 'x / y / z',
55+
breadcrumbs: 'Education / map topic',
5456
title: 'A page without sections',
55-
content: 'First paragraph. Second paragraph.'
57+
content: 'First paragraph. Second paragraph.',
58+
topics: ['key1', 'key2', 'key3', 'Education']
5659
}
5760
]
5861
expect(records).toEqual(expected)
File renamed without changes.

tests/unit/search/topics.js

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
const path = require('path')
2+
const fs = require('fs')
3+
const readFrontmatter = require('../../../lib/read-frontmatter')
4+
const walk = require('walk-sync')
5+
const { difference } = require('lodash')
6+
const allowedTopics = require('../../../data/allowed-topics')
7+
8+
const contentDir = path.join(process.cwd(), 'content')
9+
const topics = walk(contentDir, { includeBasePath: true })
10+
.filter(filename => filename.endsWith('.md') && !filename.includes('README'))
11+
.map(filename => {
12+
const fileContent = fs.readFileSync(filename, 'utf8')
13+
const { data } = readFrontmatter(fileContent)
14+
return data.topics || []
15+
})
16+
.flat()
17+
18+
const allUsedTopics = [...new Set(topics)].sort()
19+
20+
describe('Check for allowed frontmatter topics', () => {
21+
test('all used topics are allowed in /data/allowed-topics.js', () => {
22+
expect(allUsedTopics.length).toBeGreaterThan(0)
23+
const unusedTopics = difference(allUsedTopics, allowedTopics)
24+
expect(unusedTopics).toEqual([])
25+
})
26+
27+
test('all allowed topics are used by at least one content file', () => {
28+
expect(allowedTopics.length).toBeGreaterThan(0)
29+
const disallowedTopics = difference(allowedTopics, allUsedTopics)
30+
expect(disallowedTopics).toEqual([])
31+
})
32+
})

0 commit comments

Comments
 (0)