15,664,435 members
Articles / Artificial Intelligence / Machine Learning
Article
Posted 26 Jul 2018

18.8K views
41 bookmarked

An Introduction to Basic Algorithms of Machine Learning (Part 4)

Rate me:
An Introduction to Logistic Regression

Introduction

In this tip, I will introduce an optimization algorithm, logistic regression. In statistics, the logistic model is a statistical model that is usually taken to apply to a binary dependent variable. In regression analysis, logistic regression is estimating the parameters of a logistic model (wikipedia). I’ll also introduce the gradient ascent and its modified version called the stochastic gradient ascent.

Background

I have learned from many sources and you can refer to them in the References section. This is a summary of my learning.

Logistic Regression

Logistic regression is a classification algorithm that works by trying to learn a function that approximates P(Y |X). It makes the central assumption that P(Y|X) can be approximated as a sigmoid function applied to a linear combination of input features. This assumption is often written in the equivalent forms:

where:

In logistic regression, θ (theta) is a vector of parameters of length m and we are going to learn the values of those parameters based off of n training examples. The number of parameters should be equal to the number of features (xi) of each data point. The function σ(z) (sigma z) is called the logistic function (or sigmoid function).

Log Likelihood

In order to choose values for the parameters of logistic regression, we use maximum likelihood estimation (MLE). We can write the likelihood of all the data:

The log likelihood equation is:

Now that we have a function for log-likelihood, we simply need to chose the values of theta that maximize it. We can find the best values of theta by using an optimization algorithm. The optimization algorithm we will use requires the partial derivative of log likelihood with respect to each parameter:

We are ready for our optimization algorithm. To do so, we employ an algorithm called gradient ascent. The idea behind gradient ascent is that gradients point “uphill”. If you continuously take small steps in the direction of your gradient, you will eventually make it to a local maximum. The update to our parameters that results in each small step can be calculated as:

Where η (eta) is the magnitude of the step size that we take or also called the learning rate. If you keep updating θ using the equation above, you will converge on the best values of θ. The expression in the square bracket pair is also known as the error between the actual class and the predicted class.

Using the Code

We will focus on the binary classification problem in which y can take on only two values, `0` and `1`. `0` is also called the negative class, and `1` the positive class, and they are sometimes also denoted by the symbols "`-`" and "`+`". Given x(i), the corresponding y(i) is also called the label for the training example.

Suppose we have 10 data points and each point has three numeric features: X1, X2 and X3. To simplify, we will assume X1 = 1. So we have:

Our data points will be stored in the mySample.txt file, its content can look like this:

We will insert the X1 =1 to each data point later by using the Python code. The following `loadDataSet()` will load data from the mySample.txt:

Python
```def loadDataSet():
# storing X<sub>i</sub>, where i = 1,2,3
dataMat = [];
# storing labels, 0 or 1
labelMat = []
fr = open('mySample.txt')
lineArr = line.strip().split()
# notes: we are inserting X1 = 1 to the dataMat
dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])
labelMat.append(int(lineArr[2]))
return dataMat,labelMat```

We can try to test the result of the `dataMat` and the `lableMat` by using the following the lines of code:

Python
```dataMat,labelMat=loadDataSet()

print(dataMat)

print(labelMat)```

The `dataMat` looks like this:

```[[1.0, -3.0, -2.0], [1.0, -2.0, 3.0], [1.0, -1.0, -4.0],
[1.0, 2.0, 3.0], [1.0, 3.0, 4.0], [1.0, -1.0, 9.0],
[1.0, 2.0, 14.0], [1.0, 1.0, 17.0], [1.0, 3.0, 12.0], [1.0, 0.0, 8.0]]```

The `labelMat`:

`[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]`

We’ll try to use gradient ascent to fit the best parameters or weights (θi) for the logistic regression model to our data. The theta can be calculated as follows:

The lines of Python code can look like this:

Python
```dataMat, labelMat = loadDataSet()
# convert the dataMat to a matrix object
dataMatrix = mat(dataMat)
# convert the dataMat to a matrix object and transpose this matrix
labelMatrix = mat(labelMat).transpose()
# get m, n from the dataMatrix
# m is the number of data points, n is the number of features (xi) of each data point
m,n = shape(dataMatrix)
eta = 0.001
thetas = ones((n,1))
# using the sigmoid function
sig = sigmoid(dataMatrix*thetas)
error = (labelMatrix - sig)
# calculate thetas
thetas = thetas + eta * dataMatrix.transpose()* error```

The learning process must be repeated many times (such as 500 times), so the lines of code can be rewritten as follows:

Python
```...
numIter = 500
for k in range(numIter):
# using the sigmoid function
sig = sigmoid(dataMatrix*thetas)
error = (labelMatrix - sig)
# calculate thetas
thetas = thetas + eta * dataMatrix.transpose()* error```

All of the above code can be put into the `gradAscent()` function:

Python
```def gradAscent(dataMat, labelMat, numIter):
# convert the dataMat to a matrix object
dataMatrix = mat(dataMat)
# convert the dataMat to a matrix object and transpose this matrix
labelMatrix = mat(labelMat).transpose()
# get m, n from the dataMatrix
# m is the number of data points, n is the number of features (xi) of each data point
m,n = shape(dataMatrix)
eta = 0.001
thetas = ones((n,1))
for k in range(numIter):
# using the sigmoid function
sig = sigmoid(dataMatrix*thetas)
error = (labelMatrix - sig)
# calculate thetas
thetas = thetas + eta * dataMatrix.transpose()* error
return thetas```

We’re solving for a set of weights (or parameters) used to make a line that separates the different classes of data (0 and 1) and we can make a plot this line by using `matplotlib`. The following lines of code will make a scatter plot of data points in two classes (0 and 1):

Python
```dataMat,labelMat=loadDataSet()
thetaArr = np.array(thetas)
dataArr = np.array(dataMat)
# get the number of data points
m = shape(dataArr)[0]
xcord1 = []; ycord1 = []
xcord2 = []; ycord2 = []
# classifying data points in two classes (0 and 1)
for i in range(n):
if int(labelMat[i])== 1:
xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])
else:
xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])
# making a scatter plot
fig = plt.figure()
ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
for j in range(0,len(xcord1)):
ax.annotate("1", (xcord1[j],ycord1[j]))
ax.scatter(xcord2, ycord2, s=30, c='green')
for k in range(0,len(xcord2)):
ax.annotate("0", (xcord2[k],ycord2[k]))```

Next, to make a line that separates the different classes of data (0 and 1), we must calculate xi (i = 1,2,3) by solving the followinng the equation:

where x1 = 1 (our assumption), x2 is given, and so x3:

Our lines of code will look like this:

Python
```# X2 is given
X2 = arange(-3.0, 3.0, 0.1)
# calculating X3
X3 = (-thetaArr[0]-thetaArr[1]*X2)/thetaArr[2]
ax.plot(X2, X3)
plt.xlabel('X2'); plt.ylabel('X3');
plt.show()```

The result:

As you see, the best-fit line is not a good separator of the data. We will improve this by using stochastic gradient ascent. Stochastic gradient ascent is an example of an online learning algorithm. This is known as online because we can incrementally update the classifier as new data comes in rather than all at once.

We will modify the `gradAscent()` function with the following notes:

• update the thetas using only one instance at a time
• the variables `sig` and `error` are now single values rather than vectors
• `eta` changes on each iteration
• select randomly each instance to use in updating the thetas

The modified version of the `gradAscent()` function called the `stocGradAscent()` can look like this:

Python
```def stocGradAscent(dataMat, labelMat, numIter):
dataMatrix = array(dataMat)
m,n = shape(dataMatrix)
thetas= ones(n)
for j in range(numIter):
dataIndex = list(range(m))
for i in range(m):
eta = 4/(1.0+j+i)+0.01
randIndex = int(random.uniform(0,len(dataIndex)))
sig = sigmoid(sum(dataMatrix[randIndex]*thetas))
error = labelMat[randIndex] - sig
thetas = thetas + eta * error * dataMatrix[randIndex]
del(dataIndex[randIndex])
return thetas
```

So far, the result will:

Points of Interest

In this tip, I introduced logistic regression – the algorithm finds best-fit parameters to a nonlinear function called the sigmoid. I also introduced the gradient ascent and its modified version is the stochastic gradient ascent. You can view all of source code in this tip here.

History

• 26th July, 2018: Initial version
• 21st August, 2018: Second version
• 26th August, 2018: Third version

 First Prev Next
 Code Сергій Ярошко21-Aug-18 6:42 Сергій Ярошко 21-Aug-18 6:42
 Re: Code I Love Code22-Aug-18 4:02 I Love Code 22-Aug-18 4:02
 Linking the series Stefan_Lang10-Aug-18 0:09 Stefan_Lang 10-Aug-18 0:09
 Re: Linking the series I Love Code20-Aug-18 21:40 I Love Code 20-Aug-18 21:40
 Re: Linking the series Nelek20-Aug-18 23:14 Nelek 20-Aug-18 23:14
 I see you have already done some changes regarding this topic in this article. I recommend you to always write all parts and leave the active one without link. Example of what I mean: ```[ARTICLE 4] - Part 1 (as link to that article) - Part 2 (as link to that article) - Part 3 (as link to that article) - Part 4 - this one (without link)```or ```[ARTICLE 2] - Part 1 (as link to that article) - Part 2 - this one (without link) - Part 3 (as link to that article) - Part 4 (as link to that article)``` M.D.V. If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.
 Re: Linking the series I Love Code21-Aug-18 5:07 I Love Code 21-Aug-18 5:07
 Re: Linking the series Nelek21-Aug-18 5:54 Nelek 21-Aug-18 5:54
 Last Visit: 31-Dec-99 18:00     Last Update: 30-May-23 11:25 Refresh 1