I start this for practicing training models of machine learning and as a note of what I've learnt in the courses(Stanford CS229: Machine Learning (Autumn 2018)) on youtube.
The files follow the order of the courses with slight changes. Variables' name are chosen as what they look like in theory as possible, which means most of them consist of Greek alphabeta and some subscripts encoded in unicode. In the julia REPL, just type the LaTeX-like abbreviations and enter Tab to let julia print the characters, which looks like:
julia> \pi
and with Tab pressed
julia> π
π = 3.1415926535897...
julia>
Some IDE for julia and extensions also support this.
Dataset here contains all samples of row vector in column, fitting the form:
So do the labels. But to make matrix calculus easier, we just set samples in row:
In this form, the calculation in loss function of NN would be written as:
where is activation of layer, is the weight matrix, is bias.
The shape of as well as for one single sample here is , the number of neurons at layer n, by 1. And the shape of is by