Skip to content
/ illalla Public

Use python-api-flickr to retrieve tags and location from Flickr photos.

License

Notifications You must be signed in to change notification settings

daureg/illalla

Folders and files

NameName
Last commit message
Last commit date

Latest commit

2e6870b · Jan 17, 2016
Jan 10, 2014
Jan 10, 2014
Jan 10, 2014
Mar 25, 2014
Jan 10, 2014
Apr 23, 2014
Nov 9, 2013
Jun 24, 2014
Oct 31, 2014
Mar 5, 2014
Aug 6, 2014
May 19, 2014
Mar 13, 2014
Jan 18, 2014
Nov 10, 2014
Nov 3, 2014
Oct 8, 2015
Oct 3, 2015
Oct 31, 2014
Jan 17, 2016
Mar 6, 2014
Feb 17, 2015
May 3, 2014
Feb 26, 2014
Oct 31, 2014
Mar 12, 2014
Feb 28, 2015
Nov 7, 2013
Jun 17, 2014
Apr 23, 2014
Aug 6, 2014
Apr 2, 2014
Oct 13, 2013
Apr 7, 2014
May 22, 2014
Oct 24, 2013
Nov 18, 2013
Oct 26, 2015
Mar 10, 2014
Jul 29, 2014
May 20, 2014
Jun 13, 2014
Aug 21, 2014
Feb 28, 2015
May 9, 2014
Jun 27, 2014
Oct 3, 2015
May 18, 2014
Oct 23, 2013
Oct 23, 2013
Jul 21, 2014
Jul 29, 2014
Nov 3, 2014
Jul 20, 2014
Jul 9, 2014
Apr 30, 2014
Nov 7, 2013
Jun 26, 2014
May 26, 2014
Jan 10, 2014
Dec 7, 2013
Jan 10, 2014
May 29, 2014
Jun 27, 2014
Jul 29, 2014
Apr 4, 2014
Jun 21, 2014
Aug 4, 2014
Mar 12, 2014
May 9, 2014
Oct 8, 2015
Oct 22, 2013
Jul 29, 2014
May 9, 2014
May 9, 2014
Oct 8, 2015
May 14, 2014
Jun 26, 2014
Nov 7, 2013
Jun 22, 2014
Jun 27, 2014
Jun 26, 2014
Oct 31, 2014
Nov 1, 2014
Aug 26, 2014
Jun 6, 2014
Jan 10, 2014
Nov 2, 2013
May 9, 2014
Nov 18, 2013
Mar 12, 2014
Apr 23, 2014
Nov 18, 2013
Nov 18, 2013
Oct 23, 2013
Jan 7, 2014
May 3, 2014
Oct 8, 2015
Oct 8, 2015
Oct 3, 2015
Jul 20, 2014
Jul 23, 2014
Jan 10, 2014
Aug 7, 2014
Jul 23, 2014
Aug 7, 2014
Apr 30, 2014
Dec 7, 2013
Oct 3, 2015
Nov 10, 2014
Jun 10, 2014
Oct 31, 2014
Oct 8, 2015
Jan 10, 2014
Jan 10, 2014
Jan 10, 2014
Jun 13, 2014
Jul 29, 2014
Jan 10, 2014
Nov 3, 2014
Dec 10, 2014
Dec 10, 2014
Jan 10, 2014
Jan 10, 2014
Apr 23, 2014
Nov 9, 2013
Feb 28, 2015

Repository files navigation

Code supporting my Master Thesis about finding similar neighborhood across cities using social media activity.

You can get more information by reading our blogpost, a two pages academic summary, our ICWSM paper or if you have more time on your hands, my complete thesis.

If you're interested in our dataset, you can find it on Figshare. As Foursquare prohibits distributing out of data venue information, you will have to use FillDB.py to collect the latest statistics about them.

Below I provide more technical details. Yet it should be noted that for now, not all data are included with the code (although the most important can be found on dropbox and thus there is no simple demonstration one can quickly test. Hopefully this will soon be remedied :)

How it works

All the code is written in Python 2 (but is known to work with automatic 2to3 modifications under Python 3.4 as well) and dependencies can be install from requirements.txt. Map rendering is controlled by a Flask app and involve Leaflet as well as some javascript in the static directory.

First we collect data from two social media, Foursquare and Flickr. Then we aggregate data at the venue level. Each venue become a feature vector and is stored in a matrix.

Finally we devise a method that given a polygon in one city, compute the k most similar in another city.

Data collection

Flickr

  • grab_photos.py retrieve a list of all Flickr photos taken in a given city and insert them with additional metadata in a mongo database

Twitter

  • twitter.py Listen to public twitter stream for Foursquare checkin.
  • boost_twitter.py Gather more tweets by fetching timeline of previously discovered users

Foursquare

  • AskFourquare.py Parse JSON Foursquare responses to relevant python object
  • CheckinAPICrawler.py Get checkin info by requesting Foursquare API. Unfortunately, it's not working since Foursquare deployed its Swarm application, even though it can be modified to handle this new case
  • FillDB.py Use collected tweet and AskFourquare to request information about users and venues, before inserting them in a Mongo database
  • FSCategories.py Maintain a tree of Foursquare categories and provide query methods

Data processing

  • VenueFeature.py This main step is to transform the raw data collected into a feature matrix whose rows are each venue of a city with enough visits and column are the features described in table 3 page 27.
  • Surrounding.py Maintain a KD tree of venues to allow spatial ball query
  • FlickrVsFoursquare.py Compute discrepancy between tweets and photos. It's not a feature associated with a single venue so it was not used in the thesis but it's interesting nonetheless. Basically, it divides a city into a grid and find the cell where the proportion of check-ins and photos are unusual. This discriminates between very touristic location that are mostly photographed (Eiffel Tower) against location where things happen (Stadium, Railway station, …)

Computation

  • worldwide.py Defines query_in_one_city, which performs a single similarity query between a GeoJSON polygon from one city to another
  • one_approx_query.py Illustrate the use of worldwide.py on a predefined set of queries
  • approx_emd.py Avoid computing the Earth Mover's Distance between all possible rectangles in the target city by using the pruning strategy described in section 6.1 page 38
  • ClosestNeighbor.py Perform k -nearest neighbor queries over venues in two cities
  • neighborhood.py This one does too many things and most of them turned to be not working anyway

Rendering

  • ServeNN.py A flask webserver with the following interesting routes
    • /n/<origin>/<dest> offer a selection of neighborhood
    • /<origin>/<dest>/<int:knn> interactively pick a venue in the origin city and show its k nearest neighbors in the dest city

Helper

  • cities.py Define the 20 cities we choose by their bounding box and provide methods to convert between latitude, longitude and local euclidean coordinates.
  • arguments.py
  • calc_tsne.py
  • Chunker.py
  • clean_timeline.py
  • CommonMongo.py
  • Counter.py
  • LocalCartesian.py
  • OrderedDict.py
  • persistent.py
  • RequestsMonitor.py
  • explore.py
  • twitter_helper.py
  • utils.py

Not (directly) useful

  • 0708tue.py
  • 0813wed_map.py
  • alt_emd.py
  • bench.py
  • CheckinCrawler.py
  • cluster_city.py
  • common_tag.py
  • compare_tags.py
  • compile_cython.py
  • CorrectCheckIn.py
  • extract_dataset.py
  • extract_gold.py
  • figure4.py
  • first_query.py
  • gen_status.py
  • geom_stat.py
  • get_brands.py
  • ir_evaluation.py
  • LDA.py
  • learn_weights.py
  • local_vs_tourist.py
  • merge_gold.py
  • more_query.py
  • nldm.py
  • outplot.py
  • places_and_venues.py
  • plot_corr.py
  • plot_tag.py
  • preprocess.py
  • ProgressBar.py
  • rank_disc.py
  • read_foursquare.py
  • report_metrics_results.py
  • saved3.py
  • seetags.py
  • selection.py
  • significance_test.py
  • spatial_scan.py
  • specific_emd_dst.py
  • tag_support.py
  • time_all_cities.py
  • top_metrics_circle.py
  • VenueIdCrawler.py
  • wordplot.py
  • emd_leftover.py

About

Use python-api-flickr to retrieve tags and location from Flickr photos.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published