How to check for Linear Separability

3 min readMay 25, 2019

Linear separability is a usually desired (but rare) property of data. Here I explain a simple approach to find out if your data is linearly separable.

So, what does it mean for data to be linearly separable? Well, given sets X0 and X1 in an n-dimensional Euclidean space, those two sets are linearly separable if there exists n+1 real numbers w1,w2,…,wn, k such that:

Graphically, X0 and X1 are linearly separable if there exist a line, plane or hyperplane that separates them (depending on the number of dimensions of our data):

In this example, where we have two dimensions and a fairly small amount of data, it’s pretty easy to find this line by eye. But imagine having 300K 200-dimensions vectors. Then the task gets harder (like… a lot harder). Luckily for us, we don’t have to make this by ourselves!

Support-Vector Machines here to help

SVMs with linear kernel find the longest margin that separates train data. Remember the loss function of a SVM with a linear kernel? If we set the C hyperparameter to a very high number (e.g. 2^32), we will force the optimizer to make 0 error in classification in order to minimize the loss function. Thus, we will overfit the data. If we can overfit it with a linear model, that means the data is linearly separable!

SVM Loss function. With a big value of C, every miss-classification is going to be severely penalized

Why should we even bother?

If you’re working with binary classification and NN, probably you’re using a single-layer perceptron (mostly known as a dense layer with 1-dimensional output) as the last layer with a sigmoid activation. A single-layer perceptron + sigmoid using Binary Cross-Entropy loss is pretty much a Logistic Regression model, which is a linear model! If the vectors that go into the single-layer perceptron are not linearly separable, chances are your classifier is not going to perform well. It brings a little interpretability in the results of a NN.

Conclusions

The recipe to check for linear separability is:
1- Instantiate a SVM with a big C hyperparameter (use sklearn for ease).
2- Train the model with your data.
3- Classify the train set with your newly trained SVM.
4- If you get 100% accuracy on classification, congratulations! Your data is linearly separable.

How to check for Linear Separability

Support-Vector Machines here to help

Why should we even bother?

Conclusions

Written by Mauricio Mazuecos