neural_network

Neural Network

1. Variables’ definitions

$a_i^{(j)}$ : “activation” of unit $i$ in layer $j$

$\theta^{(j)}$ : matrix mapping from layer $j$ to layer $j+1$, with size of $S_{j+1}$ x $(S_j + 1)$

$L$ : layers’ count

$S_l$ : count of units (exclude bias count) in layer $l$

$\delta_j^{l}$ : error of unit $j$ in layer $l$

sigmoid function : $g(z) = \frac{1}{1 + e^{-z}}$

$J(\theta)$: cost function

a neural network picture here

Forward propagation

$a_1^{(2)} = g(\theta_{10}^{(1)}x_0 + \theta_{11}^{(1)}x_1 + \theta_{12}^{(1)}x_2 + \theta_{13}^{(1)}x_3)$

$a_2^{(2)} = g(\theta_{20}^{(1)}x_0 + \theta_{21}^{(1)}x_1 + \theta_{22}^{(1)}x_2 + \theta_{23}^{(1)}x_3)$

$a_3^{(2)} = g(\theta_{30}^{(1)}x_0 + \theta_{31}^{(1)}x_1 + \theta_{32}^{(1)}x_2 + \theta_{33}^{(1)}x_3)$

$h_\theta(x) = a_1^{(3)} = g(\theta_{10}^{(2)}a_0^{(2)} + \theta_{11}^{(2)}a_1^{(2)} + \theta_{12}^{(2)}a_2^{(2)} + \theta_{13}^{(2)}a_3^{(2)})$

$z^{(2)} = \theta^{(1)}x$

$\theta^{(1)}$ is a matrix $S_{j+1}$x$(S_j+1)$ (bias unit for +1)

$$J(\theta) = \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K [-y^{(i)}k log((h\theta(x^{(i)}))_k) - (1 - y^{(i)}k) log(1 - (h\theta(x^{(i)}))k)] + \frac{\lambda}{2m} \sum{l=1}^{l-1} \sum_{i=1}^{S_l} \sum_{j=1}^{S_{l+1}} (\theta_{ji}^{(l)})^2$$

find $\theta$ to minimize $J(\theta)$

$$\frac{\partial J(\theta)}{\partial \theta} = a_j^{(l)} \zeta_i^{(l+1)}$$

Backpropagation

$$\delta_j = a_j - y_j$$

for example

$\delta_j^{(4)} = a_j^{(4)} - y_i$

$\delta_j^{(3)} = (\theta^{(3)})^T\delta^{(4)}.*g’(z^{(3)})$

$\delta_j^{(2)} = (\theta^{(2)})^T\delta^{(3)}.*g’(z^{(2)})$