Transform document content into a set of questions using the T5 model.
- Setup
- Generating Questions for GitHub Content Pages
- Reading GitHub Links
- Generating Questions for Headers
- Sample Questions
Add the links for which you want to generate questions to the list below and set the number of questions to generate per document:
topic_links = [
"https://github.com/javascript-tutorial/en.javascript.info/blob/master/3-frames-and-windows/01-popup-windows/article.md",
...
]
num_questions_per_link = 30
- Download the content from a link.
- Parse the content by their headers.
- Go header by header and generate questions for it.
- Write the questions into a JSON format.
Functions for reading and parsing Markdown content from GitHub.
Using the doc2query/msmarco-t5-base-v1
model to generate questions based on header content.
Examples of generated questions can be found in the test.json
file.
Note: This notebook requires the transformers
and sentence-transformers
packages to be installed.