Team Member: Wenqing Zhu(wz1070), Xiaoyu Wang(xw1435), Peimeng Sui(ps3336)
Every summer, English Premier League Clubs spend a lot of money on transfer market to acquire new players. We want to provide managers with a data-driven approach to optimize their transfer decision. We will use Numpy and pandas packages in Python to scrape, clean and preprocess data of player transfers and stats data. We will create some interesting metrics to quantify the return of investment on transfer market. Ideally, we will have easily understandable visualization of those metrics with the help of matlibplot. Some potential metrics we have in mind: Money spent per goal by purchased player, Club improvement of performance considering their investment, performance improvement of transfer players, etc. In order to make them easily available for our potential users, we will create some interface by using terminal or Python notebook to allow the user to interactively control the analysis and display of the data.
Our dataset is scraped from www.transfermarkt.com, which is the most popular data provider for soccer players’ transfer records and stats. Now we have scraped all transfer records of English Premier League Clubs for summer 2016 and the corresponding real-time players’ stats in season 2016-2017. We are planning to scrape the same format of data for summer 2015 and 2014 also. Some basic variables we have for each player: transfer value, club, position, age, goals, assists, minutes, minutes per goal. We will scrape for more data if we find anything is interesting.