Skip to content

Commit

Permalink
Submit changes before installing new os
Browse files Browse the repository at this point in the history
  • Loading branch information
MartinThoma committed May 12, 2018
1 parent cd66817 commit e03eaa2
Show file tree
Hide file tree
Showing 6 changed files with 163 additions and 52 deletions.
34 changes: 17 additions & 17 deletions documents/cv-curriculum-vitae/cv-curriculum-vitae.tex
Original file line number Diff line number Diff line change
Expand Up @@ -163,22 +163,6 @@ \section{Work Experience}
and a big, but algorithmically not challenging project. To be honest,
I only fixed some Java bugs.}\\

%----------------------------------------------------------------------------------------
% WORK EXPERIENCE -2-

{\raggedleft\textsc{2011}\par}

{\raggedright\large Student research assistant at \textsc{ Institute of Toxicology and Genetics}, KIT\\
\textit{participating in a university research project}\\[5pt]}

\normalsize{In summer 2011 I worked for over a month for a
research project at KIT. I have written bash scripts for file
conversions, fixed some bugs and re-written a slow Mathematica script
in a much faster Python version. But it quickly turned out that
this project had a lot of C++ source which was rarely commented or
documented. I realized, that I wouldn't have time for this project
after beginning my studies at university.}\\

%----------------------------------------------------------------------------------------
% WORK EXPERIENCE -4-

Expand Down Expand Up @@ -208,7 +192,7 @@ \section{Work Experience}

\colorbox{shade}{\textcolor{text1}{
\begin{tabular}{c|p{7cm}}
\raisebox{-4pt}{\textifsymbol{18}} & Parkstraße 17, 76131 Karlsruhe \\ % Address
\raisebox{-4pt}{\textifsymbol{18}} & Alte Allee 107, 81245 Munich \\ % Address
\raisebox{-3pt}{\Mobilefone} & +49 $($1636$)$ 28 04 91 \\ % Phone number
\raisebox{-1pt}{\Letter} & \href{mailto:[email protected]}{[email protected]} \\ % Email address
\Keyboard & \href{http://martin-thoma.com}{martin-thoma.com} \\ % Website
Expand Down Expand Up @@ -331,6 +315,22 @@ \section{Language Skills}
%----------------------------------------------------------------------------------------

\section{Work Experience}
%----------------------------------------------------------------------------------------
% WORK EXPERIENCE -2-

{\raggedleft\textsc{2011}\par}

{\raggedright\large Student research assistant at \textsc{ Institute of Toxicology and Genetics}, KIT\\
\textit{participating in a university research project}\\[5pt]}

\normalsize{In summer 2011 I worked for over a month for a
research project at KIT. I have written bash scripts for file
conversions, fixed some bugs and re-written a slow Mathematica script
in a much faster Python version. But it quickly turned out that
this project had a lot of C++ source which was rarely commented or
documented. I realized, that I wouldn't have time for this project
after beginning my studies at university.}\\

%----------------------------------------------------------------------------------------
% WORK EXPERIENCE -3-

Expand Down
Binary file not shown.
7 changes: 4 additions & 3 deletions publications/activation-functions/abstract.tex
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
\begin{abstract}
This paper reviews the most common activation functions for convolution neural
networks. They are evaluated on TODO dataset and possible reasons for the
differences in their performance are given.
networks. They are evaluated on the Asirra, GTSRB, HASYv2, STL-10, CIFAR-10,
CIFAR-100 and MNIST dataset. Possible reasons for the differences in their
performance are given.

New state of the art results are achieved for TODO.
New state of the art results are achieved for Asirra, GTSRB, HASYv2 and STL-10.
\end{abstract}
115 changes: 99 additions & 16 deletions publications/activation-functions/appendix.tex
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,17 @@ \section*{Overview}
\centering
\hspace*{-1cm}\begin{tabular}{lllll}
\toprule
Name & Function $\varphi(x)$ & Range of Values & $\varphi'(x)$ \\\midrule % & Used by
Sign function$^\dagger$ & $\begin{cases}+1 &\text{if } x \geq 0\\-1 &\text{if } x < 0\end{cases}$ & $\Set{-1,1}$ & $0$ \\%& \cite{971754} \\
\parbox[t]{2.6cm}{Heaviside\\step function$^\dagger$} & $\begin{cases}+1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ & $\Set{0, 1}$ & $0$ \\%& \cite{mcculloch1943logical}\\
Logistic function & $\frac{1}{1+e^{-x}}$ & $[0, 1]$ & $\frac{e^x}{(e^x +1)^2}$ \\%& \cite{duch1999survey} \\
Tanh & $\frac{e^x - e^{-x}}{e^x + e^{-x}} = \tanh(x)$ & $[-1, 1]$ & $\sech^2(x)$ \\%& \cite{LeNet-5,Thoma:2014}\\
\gls{ReLU}$^\dagger$ & $\max(0, x)$ & $[0, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ \\%& \cite{AlexNet-2012}\\
\parbox[t]{2.6cm}{\gls{LReLU}$^\dagger$\footnotemark\\(\gls{PReLU})} & $\varphi(x) = \max(\alpha x, x)$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha &\text{if } x < 0\end{cases}$ \\%& \cite{maas2013rectifier,he2015delving} \\
Softplus & $\log(e^x + 1)$ & $(0, +\infty)$ & $\frac{e^x}{e^x + 1}$ \\%& \cite{dugas2001incorporating,glorot2011deep} \\
\gls{ELU} & $\begin{cases}x &\text{if } x > 0\\\alpha (e^x - 1) &\text{if } x \leq 0\end{cases}$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha e^x &\text{otherwise}\end{cases}$ \\%& \cite{clevert2015fast} \\
Softmax$^\ddagger$ & $o(\mathbf{x})_j = \frac{e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ & $[0, 1]^K$ & $o(\mathbf{x})_j \cdot \frac{\sum_{k=1}^K e^{x_k} - e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ \\%& \cite{AlexNet-2012,Thoma:2014}\\
Maxout$^\ddagger$ & $o(\mathbf{x}) = \max_{x \in \mathbf{x}} x$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x_i = \max \mathbf{x}\\0 &\text{otherwise}\end{cases}$ \\%& \cite{goodfellow2013maxout} \\
Name & Function $\varphi(x)$ & Range of Values & $\varphi'(x)$ & Used by \\\midrule %
Sign function$^\dagger$ & $\begin{cases}+1 &\text{if } x \geq 0\\-1 &\text{if } x < 0\end{cases}$ & $\Set{-1,1}$ & $0$ & \cite{971754} \\
\parbox[t]{2.6cm}{Heaviside\\step function$^\dagger$} & $\begin{cases}+1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ & $\Set{0, 1}$ & $0$ & \cite{mcculloch1943logical}\\
Logistic function & $\frac{1}{1+e^{-x}}$ & $[0, 1]$ & $\frac{e^x}{(e^x +1)^2}$ & \cite{duch1999survey} \\
Tanh & $\frac{e^x - e^{-x}}{e^x + e^{-x}} = \tanh(x)$ & $[-1, 1]$ & $\sech^2(x)$ & \cite{LeNet-5,Thoma:2014}\\
\gls{ReLU}$^\dagger$ & $\max(0, x)$ & $[0, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ & \cite{AlexNet-2012}\\
\parbox[t]{2.6cm}{\gls{LReLU}$^\dagger$\footnotemark\\(\gls{PReLU})} & $\varphi(x) = \max(\alpha x, x)$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha &\text{if } x < 0\end{cases}$ & \cite{maas2013rectifier,he2015delving} \\
Softplus & $\log(e^x + 1)$ & $(0, +\infty)$ & $\frac{e^x}{e^x + 1}$ & \cite{dugas2001incorporating,glorot2011deep} \\
\gls{ELU} & $\begin{cases}x &\text{if } x > 0\\\alpha (e^x - 1) &\text{if } x \leq 0\end{cases}$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha e^x &\text{otherwise}\end{cases}$ & \cite{clevert2015fast} \\
Softmax$^\ddagger$ & $o(\mathbf{x})_j = \frac{e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ & $[0, 1]^K$ & $o(\mathbf{x})_j \cdot \frac{\sum_{k=1}^K e^{x_k} - e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ & \cite{AlexNet-2012,Thoma:2014}\\
Maxout$^\ddagger$ & $o(\mathbf{x}) = \max_{x \in \mathbf{x}} x$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x_i = \max \mathbf{x}\\0 &\text{otherwise}\end{cases}$ & \cite{goodfellow2013maxout} \\
\bottomrule
\end{tabular}
\caption[Activation functions]{Overview of activation functions. Functions
Expand Down Expand Up @@ -63,13 +63,11 @@ \section*{Evaluation Results}
\end{tabular}
\caption[Activation function evaluation results on CIFAR-100]{Training and
test accuracy of adjusted baseline models trained with different
activation functions on CIFAR-100. For LReLU, $\alpha = 0.3$ was
activation functions on CIFAR-100. For \gls{LReLU}, $\alpha = 0.3$ was
chosen.}
\label{table:CIFAR-100-accuracies-activation-functions}
\end{table}

\glsreset{LReLU}

\begin{table}[H]
\centering
\setlength\tabcolsep{1.5pt}
Expand All @@ -91,7 +89,7 @@ \section*{Evaluation Results}
\end{tabular}
\caption[Activation function evaluation results on HASYv2]{Test accuracy of
adjusted baseline models trained with different activation
functions on HASYv2. For LReLU, $\alpha = 0.3$ was chosen.}
functions on HASYv2. For \gls{LReLU}, $\alpha = 0.3$ was chosen.}
\label{table:HASYv2-accuracies-activation-functions}
\end{table}

Expand All @@ -116,8 +114,93 @@ \section*{Evaluation Results}
\end{tabular}
\caption[Activation function evaluation results on STL-10]{Test accuracy of
adjusted baseline models trained with different activation
functions on STL-10. For LReLU, $\alpha = 0.3$ was chosen.}
functions on STL-10. For \gls{LReLU}, $\alpha = 0.3$ was chosen.}
\label{table:STL-10-accuracies-activation-functions}
\end{table}

\begin{table}[H]
\centering
\hspace*{-1cm}\begin{tabular}{lllll}
\toprule
Name & Function $\varphi(x)$ & Range of Values & $\varphi'(x)$ \\\midrule % & Used by
Sign function$^\dagger$ & $\begin{cases}+1 &\text{if } x \geq 0\\-1 &\text{if } x < 0\end{cases}$ & $\Set{-1,1}$ & $0$ \\%& \cite{971754} \\
\parbox[t]{2.6cm}{Heaviside\\step function$^\dagger$} & $\begin{cases}+1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ & $\Set{0, 1}$ & $0$ \\%& \cite{mcculloch1943logical}\\
Logistic function & $\frac{1}{1+e^{-x}}$ & $[0, 1]$ & $\frac{e^x}{(e^x +1)^2}$ \\%& \cite{duch1999survey} \\
Tanh & $\frac{e^x - e^{-x}}{e^x + e^{-x}} = \tanh(x)$ & $[-1, 1]$ & $\sech^2(x)$ \\%& \cite{LeNet-5,Thoma:2014}\\
\gls{ReLU}$^\dagger$ & $\max(0, x)$ & $[0, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\0 &\text{if } x < 0\end{cases}$ \\%& \cite{AlexNet-2012}\\
\parbox[t]{2.6cm}{\gls{LReLU}$^\dagger$\footnotemark\\(\gls{PReLU})} & $\varphi(x) = \max(\alpha x, x)$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha &\text{if } x < 0\end{cases}$ \\%& \cite{maas2013rectifier,he2015delving} \\
Softplus & $\log(e^x + 1)$ & $(0, +\infty)$ & $\frac{e^x}{e^x + 1}$ \\%& \cite{dugas2001incorporating,glorot2011deep} \\
\gls{ELU} & $\begin{cases}x &\text{if } x > 0\\\alpha (e^x - 1) &\text{if } x \leq 0\end{cases}$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x > 0\\\alpha e^x &\text{otherwise}\end{cases}$ \\%& \cite{clevert2015fast} \\
Softmax$^\ddagger$ & $o(\mathbf{x})_j = \frac{e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ & $[0, 1]^K$ & $o(\mathbf{x})_j \cdot \frac{\sum_{k=1}^K e^{x_k} - e^{x_j}}{\sum_{k=1}^K e^{x_k}}$ \\%& \cite{AlexNet-2012,Thoma:2014}\\
Maxout$^\ddagger$ & $o(\mathbf{x}) = \max_{x \in \mathbf{x}} x$ & $(-\infty, +\infty)$ & $\begin{cases}1 &\text{if } x_i = \max \mathbf{x}\\0 &\text{otherwise}\end{cases}$ \\%& \cite{goodfellow2013maxout} \\
\bottomrule
\end{tabular}
\caption[Activation functions]{Overview of activation functions. Functions
marked with $\dagger$ are not differentiable at 0 and functions
marked with $\ddagger$ operate on all elements of a layer
simultaneously. The hyperparameters $\alpha \in (0, 1)$ of Leaky
ReLU and ELU are typically $\alpha = 0.01$. Other activation
function like randomized leaky ReLUs exist~\cite{xu2015empirical},
but are far less commonly used.\\
Some functions are smoothed versions of others, like the logistic
function for the Heaviside step function, tanh for the sign
function, softplus for ReLU.\\
Softmax is the standard activation function for the last layer of
a classification network as it produces a probability
distribution. See \Cref{fig:activation-functions-plot} for a plot
of some of them.}
\label{table:activation-functions-overview}
\end{table}
\footnotetext{$\alpha$ is a hyperparameter in leaky ReLU, but a learnable parameter in the parametric ReLU function.}

\begin{figure}[ht]
\centering
\begin{tikzpicture}
\definecolor{color1}{HTML}{E66101}
\definecolor{color2}{HTML}{FDB863}
\definecolor{color3}{HTML}{B2ABD2}
\definecolor{color4}{HTML}{5E3C99}
\begin{axis}[
legend pos=north west,
legend cell align={left},
axis x line=middle,
axis y line=middle,
x tick label style={/pgf/number format/fixed,
/pgf/number format/fixed zerofill,
/pgf/number format/precision=1},
y tick label style={/pgf/number format/fixed,
/pgf/number format/fixed zerofill,
/pgf/number format/precision=1},
grid = major,
width=16cm,
height=8cm,
grid style={dashed, gray!30},
xmin=-2, % start the diagram at this x-coordinate
xmax= 2, % end the diagram at this x-coordinate
ymin=-1, % start the diagram at this y-coordinate
ymax= 2, % end the diagram at this y-coordinate
xlabel=x,
ylabel=y,
tick align=outside,
enlargelimits=false]
\addplot[domain=-2:2, color1, ultra thick,samples=500] {1/(1+exp(-x))};
\addplot[domain=-2:2, color2, ultra thick,samples=500] {tanh(x)};
\addplot[domain=-2:2, color4, ultra thick,samples=500] {max(0, x)};
\addplot[domain=-2:2, color4, ultra thick,samples=500, dashed] {ln(exp(x) + 1)};
\addplot[domain=-2:2, color3, ultra thick,samples=500, dotted] {max(x, exp(x) - 1)};
\addlegendentry{$\varphi_1(x)=\frac{1}{1+e^{-x}}$}
\addlegendentry{$\varphi_2(x)=\tanh(x)$}
\addlegendentry{$\varphi_3(x)=\max(0, x)$}
\addlegendentry{$\varphi_4(x)=\log(e^x + 1)$}
\addlegendentry{$\varphi_5(x)=\max(x, e^x - 1)$}
\end{axis}
\end{tikzpicture}
\caption[Activation functions]{Activation functions plotted in $[-2, +2]$.
$\tanh$ and ELU are able to produce negative numbers. The image of
ELU, ReLU and Softplus is not bound on the positive side, whereas
$\tanh$ and the logistic function are always below~1.}
\label{fig:activation-functions-plot}
\end{figure}

\glsreset{LReLU}
\twocolumn
48 changes: 33 additions & 15 deletions publications/activation-functions/content.tex
Original file line number Diff line number Diff line change
@@ -1,24 +1,42 @@
%!TEX root = main.tex
\section{Introduction}
TODO\cite{Thoma:2014}

\section{Terminology}
TODO
Artificial neural networks have dozends of hyperparameters which influence
their behaviour during training and evaluation time. One parameter is the
choice of activation functions. While in principle every neuron could have a
different activation function, in practice networks only use two activation
functions: The softmax function for the output layer in order to obtain a
probability distribution over the possible classes and one activation function
for all other neurons.

Activation functions should have the following properties:
\begin{itemize}
\item \textbf{Non-linearity}: A linear activation function in a simple feed
forward network leads to a linear function. This means no matter how
many layers the network uses, there is an equivalent network with
only the input and the output layer. Please note that \glspl{CNN} are
different. Padding and pooling are also non-linear operations.
\item \textbf{Differentiability}: Activation functions need to be
differentiable in order to be able to apply gradient descent. It is
not necessary that they are differentiable at any point. In practice,
the gradient at non-differentiable points can simply be set to zero
in order to prevent weight updates at this point.
\item \textbf{Non-zero gradient}: The sign function is not suitable for
gradient descent based optimizers as its gradient is zero at all
differentiable points. An activation function should have infinitely
many points with non-zero gradient.
\end{itemize}

\section{Activation Functions}
Nonlinear, differentiable activation functions are important for neural
networks to allow them to learn nonlinear decision boundaries. One of the
simplest and most widely used activation functions for \glspl{CNN} is
\gls{ReLU}~\cite{AlexNet-2012}, but others such as
One of the simplest and most widely used activation functions for \glspl{CNN}
is \gls{ReLU}~\cite{AlexNet-2012}, but others such as
\gls{ELU}~\cite{clevert2015fast}, \gls{PReLU}~\cite{he2015delving}, softplus~\cite{7280459}
and softsign~\cite{bergstra2009quadratic} have been proposed. The baseline uses
\gls{ELU}.
and softsign~\cite{bergstra2009quadratic} have been proposed.

Activation functions differ in the range of values and the derivative. The
definitions and other comparisons of eleven activation functions are given
in~\cref{table:activation-functions-overview}.


\section{Important Differences of Proposed Activation Functions}
Theoretical explanations why one activation function is preferable to another
in some scenarios are the following:
\begin{itemize}
Expand Down Expand Up @@ -96,6 +114,7 @@ \section{Activation Functions}
logistic function has a much shorter training time and a noticeably lower test
accuracy.

\glsunset{LReLU}
\begin{table}[H]
\centering
\begin{tabular}{lccc}
Expand All @@ -111,7 +130,7 @@ \section{Activation Functions}
ReLU & \cellcolor{yellow!25}Yes\footnotemark & \cellcolor{red!25} No & \cellcolor{yellow!25}Half-sided \\
Softplus & \cellcolor{green!25}No & \cellcolor{red!25} No & \cellcolor{yellow!25}Half-sided \\
S2ReLU & \cellcolor{green!25}No & \cellcolor{green!25}Yes & \cellcolor{green!25} No \\
LReLU/PReLU & \cellcolor{green!25}No & \cellcolor{green!25}Yes & \cellcolor{green!25} No \\
\gls{LReLU}/PReLU & \cellcolor{green!25}No & \cellcolor{green!25}Yes & \cellcolor{green!25} No \\
ELU & \cellcolor{green!25}No & \cellcolor{green!25}Yes & \cellcolor{green!25} No \\
\bottomrule
\end{tabular}
Expand All @@ -120,8 +139,6 @@ \section{Activation Functions}
\end{table}
\footnotetext{The dying ReLU problem is similar to the vanishing gradient problem.}

\glsunset{LReLU}

\begin{table}[H]
\centering
\begin{tabular}{lccclllll}
Expand Down Expand Up @@ -173,4 +190,5 @@ \section{Activation Functions}
functions on MNIST.}
\label{table:MNIST-accuracies-activation-functions}
\end{table}
\glsreset{LReLU}
\glsreset{LReLU}

Loading

0 comments on commit e03eaa2

Please sign in to comment.