Re-rendered all figrues to not have type 3 fonts for camera-ready ver…

…sion
NiltonGMJunior · May 18, 2015 · ab7220f · ab7220f
1 parent b3205f7
commit ab7220f
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 55 deletions.
diff --git a/paper/hypergrad_paper.tex b/paper/hypergrad_paper.tex
@@ -8,11 +8,11 @@
 \usepackage{amsmath}
 \usepackage{hyperref}
 \usepackage{multirow}
-\newcommand{\theHalgorithm}{\arabic{algorithm}}
+%\newcommand{\theHalgorithm}{\arabic{algorithm}}
 \usepackage[seed=03492]{randomorder}   % This will be updated with the arXiv ID numbers following the period, but not the version number.
-\usepackage{arxiv} 
-%\usepackage[accepted]{icml2015stylefiles/icml2015}
-
+%\usepackage{arxiv} 
+\usepackage[accepted]{icml2015stylefiles/icml2015}
+\usepackage{textcomp}
 
 \newcommand{\vw}{\mathbf{w}}
 \newcommand{\vv}{\mathbf{v}}
@@ -154,7 +154,7 @@ \section{Hypergradients}
 \label{sec:hypergradients}
 
 Reverse-mode differentiation (RMD) has been an asset to the field of machine
-learning~\citep{lecun1989backpropagation} (see the \ref{sec:appendix} for a refresher). The RMD method, known as
+learning~\citep{lecun1989backpropagation} (see the appendix for a refresher). The RMD method, known as
 ``backpropagation'' in the deep learning community, allows the gradient of a
 scalar loss with respect to its parameters to be computed in a single backward
 pass.
@@ -748,57 +748,8 @@ \section*{Acknowledgments}
 Thanks to Jason Rolfe for helpful feedback.
 We thank Analog Devices International and Samsung Advanced Institute of Technology for their support.
 
-
-\section*{Appendix: Forward vs. reverse-mode differentiation}
-\label{sec:appendix}
-By the chain rule, the gradient of a set of nested functions is given by the product of the individual derivatives of each function:
-%
-\begin{align*}
-\pderiv{f_4(f_3(f_2(f_1(x))))}{x} = \pderiv{f_4}{f_3} \cdot \pderiv{f_3}{f_2} \cdot \pderiv{f_2}{f_1} \cdot \pderiv{f_1}{x}
-\end{align*}
-If each function has multivariate inputs and outputs, the gradients are
-Jacobian matrices.
-
-Forward and reverse mode differentiation differ
-only by the order in which they evaluate this product.
-%
-Forward-mode differentiation works by multiplying gradients in the same order as
-the functions are evaluated:
-%
-\begin{align*}
-\pderiv{f_4(f_3(f_2(f_1(x))))}{x} = \pderiv{f_4}{f_3} \cdot \left( \pderiv{f_3}{f_2} \cdot \left( \pderiv{f_2}{f_1} \cdot \pderiv{f_1}{x} \right) \right)
-\end{align*}
-%
-Reverse-mode multiplies the gradients in the opposite order, starting from the
-final result:
-%
-\begin{align*}
-\pderiv{f_4(f_3(f_2(f_1(x))))}{x} = \left(  \left(  \pderiv{f_4}{f_3} \cdot \pderiv{f_3}{f_2} \right) \cdot \pderiv{f_2}{f_1} \right) \cdot \pderiv{f_1}{x} 
-\end{align*}
-%
-In an optimization setting, the final result of the nested functions, $f_4$, is
-a scalar, while the input $x$ and intermediate values, $f_1 - f_3$, can be
-vectors. In this scenario the advantage of reverse-mode
-differentiation is very clear. Let's imagine that the dimensionality of all the
-intermediate vectors is $D$. In reverse mode, we start from the (scalar) output,
-and multiply by the next $D \times D$ Jacobian at each step. The value we
-accumulate is just a $D$-dimensional vector. In forward mode, however, we must
-accumulate an entire $D \times D$ matrix at each step. But do we have still
-have to compute and instantiate the $D \times D$ Jacobian matrices themselves
-either way?  In general, yes. But in the (common) case that the vector-to-vector
-functions are either elementwise operations or (reshaped) matrix multiplications, the
-Jacobian matrices can actually be very sparse, and multiplication by the
-Jacobian can be performed efficiently without instantiation~\cite{pearlmutter2008reverse}.
-
-The main drawback of reverse-mode differentiation is that intermediate values
-must be maintained in memory during the forward pass. In sections
-\ref{sec:reversible learning} and \ref{sec:reversible computation}, we show how
-to drastically reduce the memory requirements of reverse-mode differentiation
-when differentiating through the entire learning procedure.
-
 \bibliography{references.bib}
 \bibliographystyle{icml2015stylefiles/icml2015}
 
-
 \end{document} 
 
diff --git a/paper/icml2015stylefiles/icml2015.sty b/paper/icml2015stylefiles/icml2015.sty
@@ -108,7 +108,7 @@
 % change that text.
 %%%%%%%%%%%%%%%%%%%%
 \newcommand{\ICML@appearing}{\textit{Proceedings of the
-$\mathit{31}^{st}$ International Conference on Machine Learning},
+$\mathit{32}^{nd}$ International Conference on Machine Learning},
 Lille, France, 2015. JMLR: W\&CP volume 37. 
 Copyright 2015 by the author(s).}