logistic-regression

Logisitic regression

1. Variable definitions

m : training examples’ count

$y$ :

$X$ : design matrix. each row of $X$ is a training example, each column of $X$ is a feature

$$X =
\begin{pmatrix}
1 & x^{(1)}_1 & … & x^{(1)}_n \
1 & x^{(2)}_1 & … & x^{(2)}_n \
… & … & … & … \
1 & x^{(n)}_1 & … & x^{(n)}_n \
\end{pmatrix}$$

$$\theta =
\begin{pmatrix}
\theta_0 \
\theta_1 \
… \
\theta_n \
\end{pmatrix}$$

2. Hypothesis

$$x=
\begin{pmatrix}
x_0 \
x_1 \
… \
x_n \
\end{pmatrix}
$$

$$
h_\theta(x) = g(\theta^T x) = g(x_0\theta_0 + x_1\theta_1 + … + x_n\theta_n) = \frac{1}{1 + e^{(-\theta^Tx)}},
$$

sigmoid function

$$
g(z) = \frac{1}{1 + e^{-z}},
$$

1
g = 1 ./ (1 + e .^ (-z));

3. Cost functioin

$$J(\theta) = \frac{1}{m}\sum_{i=1}^m[-y^{(i)}log(h_\theta(x^{(i)})) - (1-y^{(i)})log(1 - h_\theta(x^{(i)}))],$$

vectorization edition of Octave

1
J = -(1 / m) * sum(y' * log(sigmoid(X * theta)) + (1 - y)' * log(1 - sigmoid(X * theta)));

4. Goal

find $\theta$ to minimize $J(\theta)$, $\theta$ is a vector here

4.1 Gradient descent

$$
\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j,
$$

repeat until convergence{
     $\theta_j := \theta_j - \alpha \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}_j$
}

vectorization

$$S=
\begin{pmatrix}
h_\theta(x^{(1)})-y^{(1)} & h_\theta(x^{(2)})-y^{(2)} & … & h_\theta(x^{(n)}-y^{(n)})
\end{pmatrix}
\begin{pmatrix}
x^{(1)}_0 & x^{(1)}_1 & … & x^{(1)}_3 \
x^{(2)}_0 & x^{(2)}_1 & … & x^{(2)}_3 \
… & … & … & … \
x^{(n)}_0 & x^{(n)}_1 & … & x^{(n)}_3 \
\end{pmatrix}
$$

$$=
\begin{pmatrix}
\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}0 &
\sum
{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}1 &
… &
\sum
{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_n
\end{pmatrix}
$$

$$
=(h_\theta(X) - y)X
$$

$$
\theta = \theta - S^T
$$

$$h_\theta(X) = g(X\theta) = \frac{1}{1 + e^{(-X\theta)}}$$

$X\theta$ is nx1, $y$ is nx1

$\frac{1}{1+e^{X\theta}} - y$ is nx1

$$
\frac{1}{1 + e^{(-X\theta)}} - y=
\begin{pmatrix}
h_\theta(x^{(1)})-y^{(1)} & h_\theta(x^{(2)})-y^{(2)} & … & h_\theta(x^{(n)})-y^{(n)}
\end{pmatrix}
$$

$$
\theta = \theta - \alpha(\frac{1}{1 + e^{(-X\theta)}} - y)X
$$

5. Regularized logistic regression

to avoid overfitting or underfitting

Cost function

$$
J(\theta) = \frac{1}{m}\sum_{i=1}^m[-y^{(i)}log(h_\theta(x^{(i)})) - (1-y^{(i)})log(1 - h_\theta(x^{(i)}))] + \frac{\lambda}{2m} \sum_{j=1}^m \theta^2_j,
$$

Gradient descent

$$
\frac{\partial J(\theta)}{\partial \theta_0} = \frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_0,
$$

$$
\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j, (j \ge 1)
$$