In support of this post: http://toddwschneider.com/posts/techcrunch-bubble-index
- Scrapes all historical TechCrunch headlines back to mid-2005
- Parses TechCrunch's RSS feed to get new headlines as they're published
- Uses regular expressions to extract information from each headline
- Exposes an endpoint that returns a time series of TCBI values (i.e. number of headlines on TechCrunch over past 90 days that specifically relate to startups raising money)
There's also a simple JSON API endpoint available at http://tcbi.toddwschneider.com/data which will return the up-to-date output of TechcrunchArticle.running_total