Unsupervised Machine Learning

5 min readJul 19, 2021

In one of the recent articles, we discussed supervised machine learning, where machine systems are trained under the supervision of training data. Such models are trained with labeled input data and then it predicts the output for unforeseen data.

However, there may be times when we don’t have labeled data and need to identify hidden patterns in a given data set. To tackle such situations in Machine learning, an Unsupervised learning approach is required.

What is Unsupervised Learning?

Unsupervised learning is the process of training a machine with data that hasn’t been classified or labeled. The algorithm is then allowed to respond to that data without any supervision. Here, the task for the machine would be to organize the unsorted information into groups based on similarities, patterns, and differences without any prior training of data.

Unlike supervised learning, there is neither any teacher present nor any training data, which means the machine system is bound to discover hidden patterns in the given unlabeled data by itself.

How Does It Work?

Let’s try to understand Unsupervised Learning with the help of an example.

Assume we have an unlabeled input data set with pictures of three fruits that the machine has never seen before: apples, bananas and mangoes. The machine cannot classify the fruits according to their names since it has no clue what an apple or banana looks like.

However, the machine is capable of applying a suitable algorithm and classifying the fruits based on their similarities, patterns, and differences, i.e. the input data can simply be divided into three parts. The first half may have all photos of apples, the second might have photos of bananas and the rest would be mangoes.

The system had never learnt anything before, which means there was no training data or examples provided to it. This type of machine learning, therefore, enables the system to discover interesting patterns in the given data that were previously not detected.

Types of Unsupervised Learning Algorithm

The unsupervised learning algorithm can be further classified into two types:

1. Clustering

Clustering is a way of arranging given data into groups/clusters in such a way that those with the most similarities stay in one group while those with less or no similarities stay in another. Cluster analysis tries to identify common characteristics among data elements and classifies them according to the presence or absence of such characteristics.

There can be different types of clustering, which include- Hierarchical clustering, k-means algorithm, Principal Component Analysis, Singular Value Decomposition and Independent Component Analysis.

The example including fruits that had been mentioned above utilized cluster analysis. This algorithm helped to form clusters of similar data. Apple is small in size, round in shape, and red in colour; banana is curved in shape and yellow in colour; mango is slender in shape and greenish-yellow in colour.

Based on these characteristics, the model will learn and distinguish data. The data points similar to apple would form one cluster. Same approach would be applied for banana and mango, and distinct clusters would be created. We can also simply label the data in various categories once it has been clustered and categorised, because the data has already been solved.

2. Association

An association rule is a type of unsupervised learning approach which is used to find relationships between different variables in a large database. It identifies the group of items that appear together in the data set.

The association algorithm improves the efficacy of marketing strategies. Market basket analysis is a good example of an association rule. For instance, People who buy X item (let’s say a loaf of bread) are more likely to buy Y (butter/jam).

This is what an unsupervised learning algorithm does. It trains the model by making it understand the data and work on it right away.

Advantages and Disadvantages

Unsupervised learning is sometimes preferred over supervised learning for a variety of reasons. Here are a few of them:

Data labeling requires a lot of manual labour and expense. Unsupervised learning solves this problem by learning and categorising data without the use of labels.
Labels can be added after the data has been categorised, making the process significantly easier.
It’s useful for discovering patterns in data that aren’t easy to find with traditional approaches.
Unsupervised learning can help in the understanding of raw data, making this an ideal tool for data scientists.
In some ways, this form of learning resembles human intelligence since the model learns slowly and then calculates the results.

Let’s have a look at some of the disadvantages of the unsupervised learning algorithm:

The output may be less accurate, since there is no input data to train from.
It might also be a time-consuming process since the learning phase of the algorithm, which analyses and calculates all possibilities, might take a long time.
For some projects requiring live data, continuous data feeding to the model may be needed, which can result in both incorrect and time-consuming outputs.
More the features, the more complicated it becomes.

Unsupervised learning algorithms have a limited set of applications. It is mostly used to identify credit card fraud, for genome analysis and is quite useful in data pre-processing.

Despite the few benefits of unsupervised learning that we have described, semi-supervised learning may be a useful substitute for unsupervised learning. It’s a combination of supervised and unsupervised learning. The primary advantage of this method of learning is that it minimizes the errors of both supervised and unsupervised learning.

For example, it will only cluster unlabeled data that can be clustered, and the result will be automatically categorised once it has been labelled. This requires less computational power and consumes less time and effort.