Skip to content

Commit

Permalink
updated final paper with conclusion
Browse files Browse the repository at this point in the history
  • Loading branch information
leerichardson committed Dec 8, 2014
1 parent be8123d commit 20473de
Show file tree
Hide file tree
Showing 5 changed files with 43 additions and 25 deletions.
9 changes: 6 additions & 3 deletions final_paper/final_paper/final_paper.aux
Original file line number Diff line number Diff line change
Expand Up @@ -47,19 +47,22 @@
\@writefile{toc}{\contentsline {subsection}{\numberline {4.2}Training the model and the test error}{5}{subsection.4.2}}
\@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces A look at how feature selection change the prediction accuracy.}}{5}{table.2}}
\newlabel{table:matrix}{{2}{5}{A look at how feature selection change the prediction accuracy}{table.2}{}}
\bibcite{nba_oracle}{1}
\bibcite{data_mining}{2}
\bibcite{rpm}{3}
\citation{projections}
\@writefile{lof}{\contentsline {figure}{\numberline {3}{\ignorespaces This plot shows the test error for our different algorithms in all of our years. From this, we see that Naive Bayes and Logistic Regression gave us the best overall test error. For each testing year, we used all the previous years as training data.}}{6}{figure.3}}
\newlabel{fig:database}{{3}{6}{This plot shows the test error for our different algorithms in all of our years. From this, we see that Naive Bayes and Logistic Regression gave us the best overall test error. For each testing year, we used all the previous years as training data}{figure.3}{}}
\@writefile{toc}{\contentsline {section}{\numberline {5}Simulation}{6}{section.5}}
\@writefile{toc}{\contentsline {section}{\numberline {6}Conclusion}{6}{section.6}}
\@writefile{lof}{\contentsline {figure}{\numberline {4}{\ignorespaces Results from simulating the 2013 NBA season 1000 times, using probabilities from our best Logistic regression model. The blue lines represent our confidence intervals whereas the Red lines represent the actual number of wins for each team. Our simulations trapped the true number of wins in 70 \% of our intervals.}}{7}{figure.4}}
\newlabel{fig:simulations}{{4}{7}{Results from simulating the 2013 NBA season 1000 times, using probabilities from our best Logistic regression model. The blue lines represent our confidence intervals whereas the Red lines represent the actual number of wins for each team. Our simulations trapped the true number of wins in 70 \% of our intervals}{figure.4}{}}
\citation{sportsvu}
\bibcite{nba_oracle}{1}
\bibcite{data_mining}{2}
\bibcite{rpm}{3}
\bibcite{bigrpm}{4}
\bibcite{rpm_data}{5}
\bibcite{bball_ref}{6}
\bibcite{espn}{7}
\bibcite{gitrepo}{8}
\bibcite{projections}{9}
\bibcite{revolution}{10}
\bibcite{sportsvu}{11}
43 changes: 22 additions & 21 deletions final_paper/final_paper/final_paper.log
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
This is pdfTeX, Version 3.1415926-2.5-1.40.14 (MiKTeX 2.9) (preloaded format=pdflatex 2014.8.5) 7 DEC 2014 23:35
This is pdfTeX, Version 3.1415926-2.5-1.40.14 (MiKTeX 2.9) (preloaded format=pdflatex 2014.8.5) 8 DEC 2014 17:27
entering extended mode
**C:/Users/leeri_000/basketball_stats/game_simulation/final_paper/final_paper/f
inal_paper.tex
Expand Down Expand Up @@ -537,46 +537,47 @@ Underfull \hbox (badness 10000) in paragraph at lines 166--167

[]

[6 <C:/Users/leeri_000/basketball_stats/game_simulation/final_paper/final_paper
/algorithms.png>] [7 <C:/Users/leeri_000/basketball_stats/game_simulation/final
_paper/final_paper/season_wins.png>]
Missing character: There is no � in font ptmr7t!
Missing character: There is no � in font ptmr7t!

Underfull \hbox (badness 10000) in paragraph at lines 180--181
Underfull \hbox (badness 10000) in paragraph at lines 199--200
[]\OT1/ptm/m/n/10 Paul Fearn-head, Ben-jamin M. Tay-lor \OT1/ptm/m/it/10 On Es-
ti-mat-ing the Abil-ity of NBA Play-ers\OT1/ptm/m/n/10 . 2010:
[]

[6 <C:/Users/leeri_000/basketball_stats/game_simulation/final_paper/final_paper
/algorithms.png>] [7 <C:/Users/leeri_000/basketball_stats/game_simulation/final
_paper/final_paper/season_wins.png>]
Underfull \hbox (badness 10000) in paragraph at lines 194--195
[8]
Underfull \hbox (badness 10000) in paragraph at lines 213--214
[]\OT1/ptm/m/n/10 Sam Hinkie and the An-a-lyt-ics Rev-o-lu-tion in Bas-ket-ball
. NILKA-NTH PA-TEL
. Nilka-nth Pa-tel.
[]

Package atveryend Info: Empty hook `BeforeClearDocument' on input line 197.
[8]
Package atveryend Info: Empty hook `AfterLastShipout' on input line 197.
Package atveryend Info: Empty hook `BeforeClearDocument' on input line 221.
[9]
Package atveryend Info: Empty hook `AfterLastShipout' on input line 221.

(C:\Users\leeri_000\basketball_stats\game_simulation\final_paper\final_paper\fi
nal_paper.aux)
Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 197.
Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 197.
Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 221.
Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 221.
Package rerunfilecheck Info: File `final_paper.out' has not changed.
(rerunfilecheck) Checksum: 26928A4D4C4C62E8B62A270908ECD09C;736.


LaTeX Warning: There were multiply-defined labels.

Package atveryend Info: Empty hook `AtVeryVeryEnd' on input line 197.
Package atveryend Info: Empty hook `AtVeryVeryEnd' on input line 221.
)
Here is how much of TeX's memory you used:
7717 strings out of 493921
110991 string characters out of 3144865
208312 words of memory out of 3000000
10866 multiletter control sequences out of 15000+200000
7720 strings out of 493921
111020 string characters out of 3144865
208343 words of memory out of 3000000
10867 multiletter control sequences out of 15000+200000
24922 words of font info for 61 fonts, out of 3000000 for 9000
841 hyphenation exceptions out of 8191
37i,10n,39p,1543b,451s stack positions out of 5000i,500n,10000p,200000b,50000s
37i,10n,39p,1543b,441s stack positions out of 5000i,500n,10000p,200000b,50000s
{C:/Program Files (x86)/MiKTeX 2.9/fonts/enc/dvips/fontname/8r.enc}<C:/Progra
m Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmex10.pfb><C:/Program
Files (x86)/MiKTeX 2.9/fonts/type1/public/amsfonts/cm/cmmi10.pfb><C:/Program Fi
Expand All @@ -588,9 +589,9 @@ KTeX 2.9/fonts/type1/urw/courier/ucrr8a.pfb><C:/Program Files (x86)/MiKTeX 2.9/
fonts/type1/urw/times/utmb8a.pfb><C:/Program Files (x86)/MiKTeX 2.9/fonts/type1
/urw/times/utmr8a.pfb><C:/Program Files (x86)/MiKTeX 2.9/fonts/type1/urw/times/
utmri8a.pfb>
Output written on final_paper.pdf (8 pages, 679887 bytes).
Output written on final_paper.pdf (9 pages, 685076 bytes).
PDF statistics:
296 PDF objects out of 1000 (max. 8388607)
38 named destinations out of 1000 (max. 500000)
304 PDF objects out of 1000 (max. 8388607)
40 named destinations out of 1000 (max. 500000)
117 words of extra memory for PDF output out of 10000 (max. 10000000)

Binary file modified final_paper/final_paper/final_paper.pdf
Binary file not shown.
Binary file modified final_paper/final_paper/final_paper.synctex.gz
Binary file not shown.
16 changes: 15 additions & 1 deletion final_paper/final_paper/final_paper.tex
Original file line number Diff line number Diff line change
Expand Up @@ -171,15 +171,24 @@ \section{Simulation}
\section {Conclusion}

%% General Theory for How to clasify. Why is predicting the outcomes of seasons harder?
In the end, we feel as if there's a simple approach to choosing the best classifier. It's clear that two factors have the largest impact on each game: home court advantages and quality of each team. Imagine that there is some perfect measure of how good each team, and two teams are identical. In this situation, you should always pick the home team to win. The question then becomes, how much better does the away team have to be than the hope team in order to choose them to win? So the problem of classification can be simplified, and there's other basketball related predition problems that are much more difficult to attmept.

%% Discuss why other systems bear us. (Projections, more developed, more nuances built in, etc...) Hollinger won, had been doing it the longers, mainly with PER
One of the other prediction tasks we attempted was projecting how many games each team would win in a given season. In this task, there were more public predictions available to compare our results against, and we see that while our system was good, it did not approach some of the best publically available projection systems in the world. For instance, John Hollinger, someone recently hired by the Memphis Grizzlies, had the best projection system in 2012. He has been doing this for around 5 years, so for this reason, it makes sense his system would be more advanced. As we discuss below, there's many nuances that could be built into a projection system to make it more accurate, and only spending a semester on the project means there's not enough time to build in everything we hoped to include.


%% Why was 2013 so much less predictable than other years. Especially since we had the MOST training data. This gives some evidence to the theory that randomness may impact the results of these games more than a fancy classifier. sometimes the underdog will just win more.
One interesting thing that came out of our report was how much lower our prediction error was in the final season, 2013, compared with all others. This is counter to what one would expect in a typical machine learning situation, since we had the most amount of training data in 2013. However, we think the reason for this is that maybe there were just more upsets in 2013 than any of the other seasons. Most projection systems also showed much lower accuracy in 2013 compared with 2012. Theoretically, most classification systems will just choose the favorite to win, so if the season has more unexpected results than unusual, it shouldn't be difficult to imagine that the predictions would simply be worse. %% Salary cap? Player Movement?


%% RAPM- Why did it do so much better? More information?? What does this say about SportsVU data's potential impact as a predictor?
Another interesting result was how much superior the RAPM statistic was for predicting outcomes than anything else. We think the main reason for this is simply that it conveys more information than box score statistics, the other statistics usually available. RAPM uses possession level events to construct offensive and defensive ratings for each player, and all box score statistics can be computed with the same possession level data that calculates RAPM. What does this mean for the future of statistics in the NBA? We think it indicates that soon enough, statistics like RAPM might soon become less predictive, as even higher resultion camera data is now being collected by a company called sports VU \cite{sportsvu}. The data from these cameras tracks the location of each player 10 times each second, and is sure to produce some novel insights into the game of basketball.


%% Future developments: Using current season's data, Predicting the Spread, and building a projection of the RPM feature as opposed to just last year (more than one year, what to do with rookies? Etc..)
Finally, we look at ways in which we could have improved our classifier. One of the main flaws in our system is that we only used individual player data from the previous season in order to predict the current season. This means that if a player misses the previous season due to injury (IE: Derick Rose, Rajon Rondo, etc...), then a significant aspect of a team may be underrated by our systems. A couple ways we could combat this is by using projected season statistics to predict the current season, as some of the most accurate RAPM based systems used in 2013. That way, we could account for things like player age, injury history, and expected regression from year to year. We could also consider not just the previous season, but a weighted average of the past 3-5 seasons to get an estimate for how good each played is.

We also wanted to include current season data into our projections. To do this, we could divide each season into K different chunks, and use the teams winning percentage in all chunks of the season each game to predict the current game outcome. As seen in figure \ref{fig:simulations}, some of the estimates for how good we felt each team was before the season were much different from their win totals. Incorporating current season data into the predictions would allow us to save ourselves when these predictions start to get away from us, and update our valuation of how good each team is as we get more information on them throughout the season.

\begin{thebibliography}{1}

Expand All @@ -201,7 +210,12 @@ \section{Simulation}

\bibitem{projections} Weak Side Awareness Blog. http://weaksideawareness.wordpress.com/2013/04/23/checking-2012-13-nba-win-predictions-projections/

\bibitem{revolution} Sam Hinkie and the Analytics Revolution in Basketball. NILKANTH PATEL http://www.newyorker.com/news/sporting-scene/sam-hinkie-and-the-analytics-revolution-in-basketball
\bibitem{revolution} Sam Hinkie and the Analytics Revolution in Basketball. Nilkanth Patel. http://www.newyorker.com/news/sporting-scene/sam-hinkie-and-the-analytics-revolution-in-basketball

\bibitem{sportsvu} Zach Lowe. http://grantland.com/features/the-toronto-raptors-sportvu-cameras-nba-analytical-revolution/

\end{thebibliography}



\end{document}

0 comments on commit 20473de

Please sign in to comment.