Machine learning involves predicting and classifying data and to do so we employ various machine learning algorithms according to the dataset.
Support Vector Machine (SVM) is a linear model for classification and regression problems.It is highly preferred by many as it produces significant accuracy with less computation power.
The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points.
The simple idea of SVM is that it creates a line or hyperplane which seperates data into classes.
In this blog I will explain basic theory behind SVMs and it’s application for non-linearly separable datasets .
First of all let’s discuss about what is hyperplane.
Hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the hyperplane can be attributed to different classes. Also, the dimension of the hyperplane depends upon the number of features. If the number of input features is 2, then the hyperplane is just a line. If the number of input features is 3, then the hyperplane becomes a two-dimensional plane. It becomes difficult to imagine when the number of features exceeds 3.
Now, let’s move to the SVM’s explaination.
At first approximation what SVMs do is to find a separating line or hyperplane between data of two classes. SVM takes data as a input and outputes a hyperplane that seperates those classes if possible
Let’s start with problem. Suppose you have a dataset as shown below and you need to classify the red rectangles from the blue ellipses(let’s say positives from the negatives). So your task is to find an ideal line that separates this dataset in two classes (say red and blue).
To separate the two classes of data points, there are many possible hyperplanes that could be chosen. Our objective is to find optimal hyperplane, a plane that has the maximum margin. So how does SVM find the ideal one???
Let’s take two possible hyperplanes and figure it out.
We have two linea here, the green colored line and the yellow colored line. Which line according to you best separates the data?
If you selected the yellow line then you are right, because thats the line we are looking for. It’s visually quite intuitive in this case that the yellow line classifies better. But, we need something concrete to fix our line.
The green line in the image above is quite close to the red class. Though it classifies the current datasets it is not a generalized line and in machine learning our goal is to get a more generalized separator.
Let’s see SVM’s way to find best line.
According to the SVM algorithm we find the points closest to the line from both the classes.These points are called support vectors. Now, we compute the distance between the line and the support vectors. This distance is called the margin. Our goal is to maximize the margin. The hyperplane for which the margin is maximum is the optimal hyperplane. i.e the maximum distance between data points of both classes. Maximizing the margin distance provides some reinforcement so that future data points can be classified with more confidence.
Thus SVM tries to make a decision boundary in such a way that the separation between the two classes as wide as possible.
So this is simple example but what if you have a bit complex dataset, which is not linearly seperable? Let’s see another example.
This data is clearly not linearly separable. We cannot draw a straight line that can classify this data. But, this data can be converted to linearly separable data in higher dimension. Lets add one more dimension and call it z-axis. Let the co-ordinates on z-axis be governed by the constraint,
z = x²+y²
So, basically z co-ordinate is the square of distance of the point from origin. Let’s plot the data on z-axis.
Now the data is clearly linearly separable. Let the purple line separating the data in higher dimension be z=k, where k is a constant. Since, z=x²+y² we get x² + y² = k; which is an equation of a circle. So, we can project this linear separator in higher dimension back in original dimensions using this transformation.
Thus we can classify data by adding an extra dimension to it so that it becomes linearly separable and then projecting the decision boundary back to original dimensions using mathematical transformation. But finding the correct transformation for any given dataset isn’t that easy.