Add talks as JSON

SarahPythonista · Nov 15, 2017 · b9f8710 · b9f8710
1 parent 9ccaf12
commit b9f8710
Show file tree

Hide file tree

Showing 49 changed files with 941 additions and 0 deletions.
diff --git a/pydata-london-2015/videos/ajay-thampi-a-fast-offline-reverse-geocoder-in-python.json b/pydata-london-2015/videos/ajay-thampi-a-fast-offline-reverse-geocoder-in-python.json
@@ -0,0 +1,18 @@
+{
+ "description": "Introduction\n~~~~~~~~~~~~\n\nReverse geocoding using online web services such as Google Maps is\nincredibly slow and is also restrictive in terms of the number of\nrequests that can be made per day. Offline reverse geocoders have been\nbuilt for PostGIS databases and also Python but are either complicated\nor slow. In this talk, I will be presenting a fast, offline reverse\ngeocoder in Python. The basic outline of the talk is presented below.\n\nThe Library\n~~~~~~~~~~~\n\nThe library improves on an existing one built by Richard Penman in the\nfollowing ways:\n\n1. It supports Python 2 and 3.\n2. It geocodes a lot more location information. Besides the place name,\n city and country, the library returns the administrative regions (1 &\n 2) and the nearest latitude and longitude.\n3. But the key enhancement is performance. The library extends the K-D\n tree class in the scipy package and implements a parallelised version\n of it.\n\nThis reverse geocoder is released under the LGPL license and is\navailable `here <https://github.com/thampiman/reverse-geocoder>`__.\n\nImplementation\n~~~~~~~~~~~~~~\n\nThe first time the library is called, information on places with a\npopulation greater than 1000 is downloaded from the\n`Geonames <http://download.geonames.org/export/dump/>`__ database, and\nit is stored locally. The GPS coordinates of these places are populated\nin a K-D tree and the nearest neighbour (NN) algorithm is then used to\nfind the place closest to the input GPS coordinate. The scipy package\nprovides a `K-D tree\nclass <http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html%20%22cKDTree%22>`__\nand this is extended to implement a multi-process version. In this talk,\nI will be presenting details of this implementation. A basic background\nin Python, numpy, multi-processing and shared memory is assumed. The K-D\ntree class in the scipy package supports only the Minkowski p-norm\ndistance for the NN algorithm. Although this has not been released\npublicly, I will also be presenting a version of the library using the\nhaversine formula for much more accurate geocoding.\n\nPerformance Study\n~~~~~~~~~~~~~~~~~\n\nThe library supports two modes:\n\n1. Single-process mode (Mode 1)\n2. Multi-process mode (Mode 2): The default mode\n\nA performance comparison of the two modes on a quad-core Macbook Pro is\nshown below. |Performance Comparison|\n\nMode 2 runs 2x faster especially for large inputs, i.e. 10M coordinates.\n\nApplications\n~~~~~~~~~~~~\n\nIn this part of the talk, I will discuss how the library is being used\nat `OpenSignal <http://opensignal.com/>`__, where I work as a data\nscientist. The main purpose for building the library was to be able to\ngeocode terabytes of data (approx. 500M coordinates). Speed was\ntherefore crucial. I will discuss methods on geocoding at this scale in\nreal-time and also offline. I will also talk about how this open-source\nlibrary is being used by the community.\n\nContributions by the Community\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSince its release on Github on 27-Mar-2015, the open-source community\nhas also been instrumental in testing, fixing bugs and implementing\nadditional features. In this part of the talk, I will given an overview\nof the following two major changes made by other developers:\n\n1. Python 3 support, and\n2. `C++\n wrapper <https://github.com/thampiman/reverse-geocoder/tree/master/c++>`__\n for the Python library.\n\n.. |Performance Comparison| image:: https://raw.githubusercontent.com/thampiman/reverse-%20geocoder/master/performance.png\n\n",
+ "duration": 962,
+ "language": "eng",
+ "recorded": "2015-06-21",
+ "speakers": [
+ "Ajay Thampi"
+ ],
+ "summary": "A fast, offline reverse geocoder in Python. This implementation uses\na parallelised K-D tree and the details of this implementation will\nbe presented. The key feature is speed; 10 million coordinates can be\ngeocoded in less than 30 seconds. The library is released under the\nLGPL license and is available at\nhttps://github.com/thampiman/reverse-geocoder.",
+ "thumbnail_url": "https://i.ytimg.com/vi/8TR3RxJXjr0/hqdefault.jpg",
+ "title": "A Fast, Offline Reverse Geocoder in Python",
+ "videos": [
+ {
+ "type": "youtube",
+ "url": "https://www.youtube.com/watch?v=8TR3RxJXjr0"
+ }
+ ]
+}
diff --git a/pydata-london-2015/videos/alex-chamberlain-deploying-a-model-to-production.json b/pydata-london-2015/videos/alex-chamberlain-deploying-a-model-to-production.json
@@ -0,0 +1,17 @@
+{
+ "description": "Are you used to running your models on a laptop? Do you want to know how\nto run it in a production environment? Prepare yourself for a whirlwind\ntour of what it takes to run a model 24/7. We'll be looking at\nBloomberg's infrastructure for running utility market models. By the\nend, you'll have a good idea what it takes to achieve this without being\nwoken up at 5am more than a couple of times.",
+ "duration": 2017,
+ "language": "eng",
+ "recorded": "2015-06-21",
+ "speakers": [
+ "Alex Chamberlain"
+ ],
+ "thumbnail_url": "https://i.ytimg.com/vi/rYlVcPs4JQI/hqdefault.jpg",
+ "title": "Deploying a Model to Production",
+ "videos": [
+ {
+ "type": "youtube",
+ "url": "https://www.youtube.com/watch?v=rYlVcPs4JQI"
+ }
+ ]
+}
diff --git a/pydata-london-2015/videos/armando-vieira-jointly-embedding-knowledge.json b/pydata-london-2015/videos/armando-vieira-jointly-embedding-knowledge.json
@@ -0,0 +1,18 @@
+{
+ "description": "Targeting knowledge graphs completion is a recent paradigm that allow\nextraction of new relations (facts) from existing knowledge graphs like\nFreebase or GeneOntology. Word embeddings represents each entity into a\nlow dimensional space and the relationships as vectorial transformations\nwhich has the advantage of making the search space continuous. This\nallows to encode the entities and transformations with global\ninformation from the entire graph. On the other hand, word embedding\napproaches, like word2vec, extracted from unlabeled text allows\nrepresentations of words as vectors, although it doesn't allow to\nextract relationships . By careful alignment of entities from free text\nwith a knowledge graph it is possible to combine both approaches and\njointly extract new knowledge through relationships between entities and\nwords / phrases. We will show results from applying this technology to\nbiomedical data.\n",
+ "duration": 2031,
+ "language": "eng",
+ "recorded": "2015-06-21",
+ "speakers": [
+ "Armando Vieira"
+ ],
+ "summary": "Recent advances in combining structured graph data with textual data\nusing embedding word representations from a large corpus of\nunlabelled data. This allows to expand the knowledge base graph and\nextract complex semantic relationships.",
+ "thumbnail_url": "https://i.ytimg.com/vi/UAMGMMqjHuY/hqdefault.jpg",
+ "title": "Jointly Embedding knowledge from large graph databases with textual data using deep learning",
+ "videos": [
+ {
+ "type": "youtube",
+ "url": "https://www.youtube.com/watch?v=UAMGMMqjHuY"
+ }
+ ]
+}
diff --git a/...onen-how-can-python-help-us-understand-londons-most-important-transportation-network.json b/...onen-how-can-python-help-us-understand-londons-most-important-transportation-network.json
@@ -0,0 +1,18 @@
+{
+ "description": "The daily commute in London is an adventure that keeps surprising, in\ngood ways and in bad. Some mornings things go smoothly: for once there\nis an empty seat on the Northern Line and you only need to wait ten\nminutes before reaching the escalator at Bank. On other days, a signal\nfailure at Moorgate and all Piccadilly line travellers will become\nintimately acquainted with the sweet fragrance of a fellow commuter's\ntoothpaste. After a few months, the more or less frustrated commuter\nwill begin to notice some interesting patterns and ask questions. Why is\nit that a suspended Circle line can wreak havoc on commuters on the\nCentral line? Can delays at Victoria lead to overcrowding of the Jubilee\nplatform at London Bridge? What is the fastest way to visit all stations\nin the Tube network?\n\nThe London Tube can be modelled as a graph with vertices representing\nthe stations and edges representing the connecting Tube lines. Using\nintuition and daily experience, we can guess that there are certain\nvertices in the Tube network that are vital to the overall health of the\nnetwork. Pythonic graph analysis libraries such as graph-tool and\ninteractive visualizations with Bokeh can corroborate that suspending\nstations \u2013 for example Baker Street, which lies at the intersection of 5\nTube lines- can cause congestion even at remote stations throughout the\nnetwork.\n\nIn addition to studying the static properties of the London Tube, we\nwill leverage the power of SimPy to create various simulations. This\nwill allow us to explore how the Tube network evolves when key\nparameters such as commuter numbers, train speeds and the frequency of\nsignal failures are introduced on and off peak times.\n\nThis talk is aimed at novice Python users. The attendees will hopefully\ncome away with a basic understanding of how to use the graph-tool\nlibrary to create graphs and store meta-information about them, how to\nuse Bokeh to stream information to an interactive browser visualization\nand how to use SimPy to create simple simulations. While it will not be\ntechnically challenging, it will hopefully inspire beginner Pythonistas\nto seek interesting problems and apply libraries in the Python data\nanalysis ecosystem to pose questions about data and then figure out the\nanswers!\n",
+ "duration": 2200,
+ "language": "eng",
+ "recorded": "2015-06-20",
+ "speakers": [
+ "Camilla Montonen"
+ ],
+ "summary": "If you work in London, you have certainly experienced the joys and\nfrustrations of commuting on the Tube. There are the very good mornings\n(lo and behold! An empty seat on the Northern Line) and then there are\nthe very bad mornings (there is more space in a sardine can than on the\nPiccadilly). Using graph-tool, Bokeh and SimPy, we unravel some of the\nfascinating features of the London Tube.\n",
+ "thumbnail_url": "https://i.ytimg.com/vi/oWDhvKewVVA/hqdefault.jpg",
+ "title": "A Tube Story: How can Python help us understand London's most important transportation network?",
+ "videos": [
+ {
+ "type": "youtube",
+ "url": "https://www.youtube.com/watch?v=oWDhvKewVVA"
+ }
+ ]
+}
diff --git a/...-london-2015/videos/chen-et-al-how-good-is-your-model-and-how-can-you-make-it-better.json b/...-london-2015/videos/chen-et-al-how-good-is-your-model-and-how-can-you-make-it-better.json
@@ -0,0 +1,21 @@
+{
+ "description": "The objective of this tutorial is to give participants the skills\nrequired to validate, evaluate and fine-tune models using scikit-learn\u2019s\nevaluation metrics and parameter search capabilities. It will combine\nboth the theoretical rationale behind these methods and their code\nimplementation.\n\nThe session will be structured as follows (rough timings in\nparentheses):\n\n1. Explanation of over-fitting and the bias-variance trade-off, followed\n by a brief conceptual overview of cross-validation, bootstrapping,\n and ensemble methods, in particular with respect to bias and\n variance. Pointers to the corresponding scikit-learn functions will\n also be given. (20 minutes)\n2. Implementation of cross-validation and grid-search method for\n parameter tuning, using KNN classification as an illustrative\n example. Participants will train two KNN neighbours with different\n numbers of neighbours on preprocessed data (provided). They will then\n be guided through cross-validation, plotting of results, and\n grid-search to find the best neighbour and weight configuration(s).\n (30 minutes)\n3. Comparison of different classification models using cross-validation.\n Participants will implement a logistic regression, linear and\n non-linear support vector machine (SVM) or neural network model and\n apply the same cross-validation and grid search method as in the\n guided KNN example. Participants will then compare their plots,\n evaluate their results and discuss which model they might choose for\n different objectives, trading off generalisability, accuracy, speed\n and randomness. (70 minutes)\n\nWe assume participants will be familiar with numpy, matplotlib, and at\nleast the intuition behind some of the main classification algorithms.\nBefore the tutorial, participants with github accounts should fork from\nhttps://github.com/cambridgecoding/pydata-tutorial or download the files\nand iPython notebook so they can participate in the hands on activities.\nRequired libraries: numpy, scikit-learn, matplotlib, pandas, scipy,\nmultilayer\\_perceptron (provided)\n",
+ "duration": 3754,
+ "recorded": "2015-06-19",
+ "speakers": [
+ "Chih-Chun Chen",
+ "Elena Chatzimichali"
+ ],
+ "summary": "This hands-on tutorial will show you how to use scikit-learn\u2019s model\nevaluation functions to evaluate different models in terms of\naccuracy and generalisability, and search for optimal parameter\nconfigurations.\n",
+ "tags": [
+ "tutorial"
+ ],
+ "thumbnail_url": "https://i.ytimg.com/vi/oKHeAtOgMNA/hqdefault.jpg",
+ "title": "How \u201cgood\u201d is your model, and how can you make it better?",
+ "videos": [
+ {
+ "type": "youtube",
+ "url": "https://www.youtube.com/watch?v=oKHeAtOgMNA"
+ }
+ ]
+}
diff --git a/...5/videos/demeter-sztanko-analysis-and-transformation-of-geospatial-data-using-python.json b/...5/videos/demeter-sztanko-analysis-and-transformation-of-geospatial-data-using-python.json
@@ -0,0 +1,21 @@
+{
+ "description": "1. Installing tools and packages (ansible scripts/cloud machine will be\n provided)\n2. Basic concepts\n3. geometries. Fixing invalid geometries.\n4. basic operations (union, intersection, difference)\n5. advanced operations (buffer, convex hull, skeleton, voronoi)\n6. Formats and tools\n7. Shapefile\n8. GeoJSON\n9. TopoJSON\n10. GeoTIFF\n11. OSM data\n12. Projections\n13. common projections and their use\n14. how to chose the projection\n15. reprojecting datasets\n16. Using geospatial index\n17. Open Geospatial datasets\n18. UK specific datasets\n19. OpenStreetMap\n20. elevation data\n21. Geocoding\n22. Storing geospatial data in databases (PostgreSQL, SpatiaLite,\n MongoDB, ElasticSearch)\n",
+ "duration": 4689,
+ "language": "eng",
+ "recorded": "2015-06-19",
+ "speakers": [
+ "Demeter Sztanko"
+ ],
+ "summary": "A tutorial covering some general concepts of geospatial data, main\nformats in which it is distributed and some common places where this\ndata can be acquired. We will also learn how to read, process and\nvisualise this data using Python and QGIS. This talk will cover some\ntypical problems one can experience when working with geospatial\ndata.",
+ "tags": [
+ "tutorial"
+ ],
+ "thumbnail_url": "https://i.ytimg.com/vi/raCAJSAFOIU/hqdefault.jpg",
+ "title": "Analysis and transformation of geospatial data using Python",
+ "videos": [
+ {
+ "type": "youtube",
+ "url": "https://www.youtube.com/watch?v=raCAJSAFOIU"
+ }
+ ]
+}
diff --git a/...s/dylan-barth-stuart-coleman-a-beginners-guide-to-building-data-pipelines-with-luigi.json b/...s/dylan-barth-stuart-coleman-a-beginners-guide-to-building-data-pipelines-with-luigi.json
@@ -0,0 +1,19 @@
+{
+ "description": "Growth Intelligence tracks the performance and activity of all the\ncompanies in the UK economy using their data \u2018footprint\u2019. This involves\ntracking numerous unstructured data points from multiple sources in a\nvariety of formats and transforming them into a standardised feature set\nwe can use for building predictive models for our clients.\n\nIn the past, this data was collected by in a somewhat haphazard fashion:\ncombining manual effort, ad hoc scripting and processing which was\ndifficult to maintain. In order to streamline the data flows, we\u2019re\nusing an open-source Python framework from Spotify called Luigi. Luigi\nwas created for managing task dependencies, monitoring the progress of\nthe data pipeline and providing frameworks for common batch processing\ntasks.\n",
+ "duration": 2139,
+ "language": "eng",
+ "recorded": "2015-06-21",
+ "speakers": [
+ "Dylan Barth",
+ "Stuart Coleman"
+ ],
+ "summary": "An introduction to Luigi with real life case studies showing how you\ncan break large, multi-step data processing task into a graph of\nsmaller sub-tasks that are aware of the state of their\ninterdependencies.",
+ "thumbnail_url": "https://i.ytimg.com/vi/gz7tba-R7QY/hqdefault.jpg",
+ "title": "A Beginner's Guide to Building Data Pipelines with Luigi",
+ "videos": [
+ {
+ "type": "youtube",
+ "url": "https://www.youtube.com/watch?v=gz7tba-R7QY"
+ }
+ ]
+}
diff --git a/pydata-london-2015/videos/eddie-bell-the-dark-art-of-search-relevancy.json b/pydata-london-2015/videos/eddie-bell-the-dark-art-of-search-relevancy.json
@@ -0,0 +1,18 @@
+{
+ "description": "Search is a hard area to work in. Techniques are not made public due to\ntheir value and little academic work is done in the area. Furthermore,\nGoogle has made the exceptional an everyday experience so the bar for\nsuccess is very high from the outset.\n\nSearch data sets are also hard to create due to the nebulous\never-changing concept of search relevancy. When, and to what degree, is\na result deemed to be relevant for a given search term? The\nElasticSearch documentation states it well: *\" Search relevancy tuning\nis a rabbit hole that you can easily fall into and never emerge\"*.\n\nIn this presentation I'll give a introduction to building a search\nrelevancy data set with python using crowd-sourcing and the Trueskill\nalgorithm from Microsoft. Trueskill is used for matchmaking on XBox Live\nand it allows us to transform moderated pairwise comparisons into\nrankings. The rankings can then be used to learn what results best match\na given search phrase. I'll briefly cover how we're modeling the\nmoderated rankings at Lyst using deep learning.\n\nReferences\n~~~~~~~~~~\n\nM. Hadi Kiapour, Kota Yamaguchi, Alexander C. Berg, Tamara L. Berg.\nHipster Wars: Discovering Elements of Fashion Styles (2014).\n\nYelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Gr\u00e9goire Mesnil.\nLearning semantic representations using convolutional neural networks\nfor web search (2014).\n\nRalf Herbrich, Tom Minka, and Thore Graepel. TrueSkill(TM): A Bayesian\nSkill Rating System (2007).\n",
+ "duration": 1342,
+ "language": "eng",
+ "recorded": "2015-06-20",
+ "speakers": [
+ "Eddie Bell"
+ ],
+ "summary": "Building a search engine is a dark art that is made even more\ndifficult by the nebulous ever-changing concept of search relevancy.\nWhen, and to what degree, is a result deemed to be relevant for a\ngiven search term? In this talk I will describe how we built a Lyst\nsearch relevancy data set using heuristics, crowd-sourcing and Xbox\nLive matchmaking.\n\nFull details \u2014\u00a0http://london.pydata.org/schedule/presentation/1/",
+ "thumbnail_url": "https://i.ytimg.com/vi/Y_0gF4z-9Nc/hqdefault.jpg",
+ "title": "The Dark Art of Search Relevancy",
+ "videos": [
+ {
+ "type": "youtube",
+ "url": "https://www.youtube.com/watch?v=Y_0gF4z-9Nc"
+ }
+ ]
+}
diff --git a/pydata-london-2015/videos/eleonore-mayola-getting-meaning-from-scientific-articles.json b/pydata-london-2015/videos/eleonore-mayola-getting-meaning-from-scientific-articles.json
@@ -0,0 +1,18 @@
+{
+ "abstract": "The bibliography process is necessary part of biomedical science, but\nresearchers also tend to find it boring! For it to be less time\nconsuming it would be interesting to automate part of the process.\nHere Python machine learning libraries can help to determine whether\na research article is worth reading!",
+ "description": "The bibliography process means every scientist regularly has to go\nthrough a lot of published articles in parallel to her/his research. The\naim is to:\n\n- know what other researchers are doing: they might be ahead\n of you, they might have proven your project is a dead end.\n- get some context to interpret your research results.\n\nUsing specialised search engines can be inefficient if you don't use\nthe \"right\" keywords. Researcher also tend to find bibliography\nboring so it would be interesting to automate part of the process!\n\nIn my talk I'll answer the following question:\n\n- can Python machine learning libraries (nltk, scikit-learn) be used\n to determine whether a research article is worth reading?\n\nI'll use the TF-IDF measure to identify frequent topics appearing in\nspecific scientific articles and train a classifier to distinguish\nbetween relevant and non-relevant articles depending and someone's\ninterests.\n",
+ "duration": 2226,
+ "language": "eng",
+ "recorded": "2015-06-20",
+ "speakers": [
+ "\u00c9l\u00e9onore Mayola"
+ ],
+ "thumbnail_url": "https://i.ytimg.com/vi/400rl6PNzgE/hqdefault.jpg",
+ "title": "Getting Meaning from Scientific Articles",
+ "videos": [
+ {
+ "type": "youtube",
+ "url": "https://www.youtube.com/watch?v=400rl6PNzgE"
+ }
+ ]
+}