I use Google Gemeni long context on Illinois State University scraped data to create the transcript of a promotional video. I crawled data from the ISU websited, when fed the whole data, Gemini reported above 5 M tokens, so I tried 1/6 chunks. The data is available in the repo.
Be sure to set mykey to your own gemeni key obtained here: https://aistudio.google.com/apikey
I user crawling capabities by Yatharth Bisht on Kaggle: https://www.kaggle.com/code/yatharthbisht/generating-openapi-specs-for-crms-using-gemini
Happy Coding!