Update README.md

The-Gupta · Jun 25, 2020 · c733deb · c733deb
1 parent a790383
commit c733deb
Showing 1 changed file with 21 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -4,3 +4,24 @@ Web Scraping of TED.com for complete Metadata, Transcript, Audio, Video, Images
 Environment: Google Colab with Google Drive without any Hardware Accelerator
 
 Scraped Data: https://www.kaggle.com/thegupta/ted-talk
+
+
+### Context
+
+I was looking for an interesting dataset for a personal Data Science project, and I'm a fan of TED. So, I looked for the TED dataset, found [Rounka's](https://www.kaggle.com/rounakbanik/ted-talks) but it is incomplete and outdated. Then, [I scraped myself](https://github.com/The-Gupta/TED-Scraper/blob/master/Scraper.ipynb) and made it super fast using Parallel Programming. Now, it **downloads all Metadata along with the Transcript in 300 seconds of all 4609 Talks on the website***. This is the **most comprehensive TED Talk dataset** which includes [media files](https://drive.google.com/drive/folders/1clqw9izazxafPDuIekXQYYdI-J42VvCR) (images, audio, and video) too!
+*Scraped on 24-JUN-20. One can scrape entire TED.com using the code to get the latest dataset in 5 minutes. 
+Downloading media files take time: 2 minutes for photos of Speaker and Talk
+
+### Content
+
+Each row corresponds to a Talk on TED.com and each column details Metadata (generic/speaker/talk related information) plus Transcript.
+
+
+### Acknowledgements
+
+I thank Google for Colab.
+
+
+### Inspiration
+
+We've got the entire TED.com in an Excel sheet, let's find some INSIGHTS WORTH SHARING!