HomeMathQuantMLGithubContact

Logistical Regression


May 03, 2020

Binary Classification

Looking to classify y{0,1}y \in \{0,1\}, a binary outcome.

Example (Tumour Classification)

tumourgraph

In an example like this, we could try using linear regression and use a threshold classifier such as:

  • If h0(x)0.5h_0 (x) \geq 0.5, predict y=1y=1
  • If h0(x)<0.5h_0(x) < 0.5, predict y=0y=0

However, outlier points which can skew the regression line gives us a worse hypothesis. For this reason, linear regression isn’t fantastic for binary classification.

Linear regression can also allow values to extend beyond the range {0,1}\{0,1\}, again making it a weak model for binary classification.

Introducing Logistical Regression

For this type of model, we want:

0gθ(x)10 \leq g_\theta (x) \leq 1

For this reason, we will need to change our definition of hθ(x)h_\theta (x) from what was used in linear regression. Our new hypothesis will be defined as:

hθ(x)=g(θTx),g(z)=11+ezh_\theta (x) = g(\theta^T x), \quad g(z) = \frac{1}{1+e^{-z}}

Where g(z)g(z) can be referred to as the Sigmoid or Logistic function. The full version of our hypothesis function is:

hθ(x)=11+eθTxh_\theta (x) = \frac{1}{1+e^{-\theta ^Tx}}

As before, we want to fit the parameters θ\theta.

Interpretation of Hypothesis

When hθ(x)h_\theta (x) outputs a number, we will interpret it as the estimated probability that y=1y=1 on input xx.

For example, if we are using the tumour example with:

x=[x0x1]=[1Tumour Size],hθ(x)=0.7x = \begin{bmatrix} x_0 \\ x_1 \end{bmatrix} = \begin{bmatrix} 1 \\ \text{Tumour Size} \end{bmatrix}, \quad h_\theta (x) = 0.7

i.e. the probability of y=1y=1 is 0.70.7. Then we say that the patient has a 70%70\% chance of the tumour being malignant. More formally, we can write this as:

hθ(x)=Pr(y=1x;θ)h_\theta (x) = \Pr(y=1 | x;\theta)

Since yy can only be either 00 or 11, we can also compute the probability that y=0y=0

Decision Boundary

For logistic regression, we will define our hypothesis decision boundary as:

  • Predict y=1y=1 if hθ(x)0.5h_\theta (x) \geq 0.5
  • Predict y=0y=0 if hθ(x)<0.5h_\theta (x) < 0.5

When will we end up in such cases? If we look at the sigmoid function:

sigmoid
  • hθ(x)0.5h_\theta (x) \geq 0.5 when θTx0\theta^T x\geq 0
  • hθ(x)<0.5h_\theta (x) < 0.5 when θTx<0\theta^Tx < 0

Example

Consider the following dataset:

example db

Our hypothesis would therefore be:

hθ(x)=g(θ0+θ1x1+θ2x2)h_\theta (x) = g(\theta_0 + \theta_1 x_1 + \theta_2 x_2)

Further suppose that our fitted values of theta are:

θ=[311]\theta = \begin{bmatrix}-3 \\ 1\\1\end{bmatrix}

In this case, we will predict y=1y=1 if:

3+x1+x20x1+x23\begin{aligned} -3 + x_1 + x_2 &\geq 0\\ x_1 + x_2 &\geq 3 \end{aligned}

Which is an equation for the following decision boundary line:

example db2

Therefore, we will predict y=1y=1 for any point that falls to the right of the line. Vice-versa for y=0y=0.

Non-linear Decision Boundary

Consider the following dataset:

nonlinear db

In linear regression, we were able to add higher order polynomials for situations like this. We can do the same with logistic regression. Suppose our hypothesis looks like:

hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x22)h_\theta (x) = g(\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1 ^2 + \theta_4 x_2 ^2)

Futher suppose that our fitted values are:

θ=[10011]\theta = \begin{bmatrix} -1\\0\\0\\1\\1\end{bmatrix}

Therefore, we predict y=0y=0 if:

1+x12+x220x12+x221\begin{aligned} -1 + x_1 ^2 + x_2 ^2 &\geq 0\\ x_1 ^2 + x_2 ^2 &\geq 1 \end{aligned}

This produces the following decision boundary:

nonlinear db2

You can use polynomial terms to fit decision boundaries that take on many different shapes and sizes, not just circles.


Built with GatsbyJS