Introduction to Unsupervised Learning¶

Uncovering Structure with Unsupervised Learning¶

In our previous lessons, we focused on supervised learning, where our primary goal was to predict a well-defined target variable (y). We had an "answer key" in our historical data, which allowed us to train and evaluate our models' ability to predict outcomes like house prices or iris species.

However, many business problems do not come with a clear-cut target variable. We often have large amounts of data and a general goal to "find something interesting" or "understand our customers better." This is where unsupervised learning comes in.

Unsupervised learning is a class of machine learning techniques used to find patterns, structures, and relationships in data that has not been labeled with a target outcome. Instead of predicting a known answer, the goal is to discover the inherent structure within the data itself.

In this lesson, we will explore the two most common types of unsupervised learning:

Clustering: The task of automatically grouping similar data points together. This is widely used for applications like customer segmentation, where the goal is to discover distinct groups of customers based on their behavior or demographics.
Dimensionality Reduction: The process of reducing the number of features in a dataset while retaining as much of the important information as possible. This is useful for simplifying models, improving performance, and enabling visualization of high-dimensional data.

Most importantly, we will see how these techniques are not just for descriptive analysis; they are powerful tools that can be used to enhance modeling workflows by creating better, more informative features.