Next Steps and Active Learning¶
5.0 Next Steps and Module Summary¶
This section provides recommended exercises to practice the concepts of unsupervised learning and concludes with a summary of the entire modeling module.
Homework and Active Learning¶
-
Experiment with a Different Clustering Algorithm:
KMeans
is excellent for spherical clusters, but other algorithms have different strengths.- Task: Use the
make_moons
dataset fromsklearn.datasets
. Apply bothKMeans
and another algorithm calledDBSCAN
to it. Visualize the results from both. - Analysis: How does
DBSCAN
's performance on this non-spherical data compare toKMeans
?
- Task: Use the
-
Explore an Alternative for Visualization: While PCA is great, t-SNE is another powerful technique specifically designed for visualizing high-dimensional data.
- Task: Import
TSNE
fromsklearn.manifold
. Apply it to thedigits
dataset (reducing it to 2 components) and create a scatter plot, just as you did with PCA. - Analysis: Compare the t-SNE visualization to the PCA visualization. Does t-SNE create more distinct visual clusters for the different digits?
- Task: Import
-
Feature Engineering with PCA: In our lab, we used clustering to create a feature. Now, apply PCA for the same purpose.
- Task: Take all the numerical
sqft_
features from the King County housing dataset (sqft_living
,sqft_lot
,sqft_above
, etc.). Use aPipeline
to scale them and then applyPCA
to reduce them to just two "size components." Add these two new features to your dataset and train a regression model. - Analysis: Does your model's performance improve compared to using the original
sqft_
features?
- Task: Take all the numerical
-
Fine-Tune Your Clustering Model:
- Task: Apply the Elbow Method to the Iris dataset's features to programmatically determine the optimal number of clusters.
- Analysis: Does the result from the Elbow Method correctly suggest
k=3
, which we know to be the true number of species?
Lesson and Module Summary¶
In this lesson, you explored unsupervised learning, discovering how to find inherent structure in data without a target variable. You learned to use K-Means for clustering to identify meaningful segments and PCA for dimensionality reduction to simplify and visualize complex data. Most importantly, you saw how these powerful techniques can be used to engineer better features for your supervised models.
This concludes our module on modeling. You have progressed from understanding the basic supervised learning workflow in Lesson 1 to building robust, automated pipelines with hyperparameter tuning in Lesson 2. Now, in Lesson 3, you've added unsupervised learning to your toolkit. You now possess a comprehensive framework for building, evaluating, and improving machine learning models to solve a wide array of data science problems.