Course review: Intermediate Machine Learning on Kaggle

The second course of scikit-learn where you get a broader view of the library

Jan 07, 2022

This is the continuation of the Intro to Machine Learning on Kaggle course. If you have not checked that out, please do so first.

This course is great for learning scikit-learn library, a powerful machine learning library in Python.

What this course is not teaching you is how machine learning algorithms are built. In this course you will accept that algorithms are black boxes and focus on how to implement them in Python.

In the introduction, you are promised to learn the following:

tackle data types often found in real-world datasets (missing values, categorical variables),
design pipelines to improve the quality of your machine learning code,
use advanced techniques for model validation (cross-validation),
build state-of-the-art models that are widely used to win Kaggle competitions (XGBoost), and
avoid common and important data science mistakes (leakage).

Course overview

There are 7 lessons in this course:

All lesson’s reading material is good readings worth your while. To all lessons there are some exercises. Most of your work in the course will be done in the lesson about missing values and categorical variables. In the rest of the lessons you are mostly asked to copy-paste the material that is given to you in the reading materials.

Overall I think this course gives a wide view of what you should think about when you structure a machine learning project. Using pipelines is good for readability and enables cross-validation, which makes it possible to train a model on a larger part of your data. Learning about XGBoost, and that it is a great algorithm, is good but you need to look elsewhere to understand how it works. The Official XGBoost manual is here.

Note: When you want to visualize what XGBoost does, you should look into SHAP values. In the Kaggle course Machine Learning Explainability you have two full lessons about it. A review will be done by me later on.

Overall this course is good if you want to learn Python and scikit-learn for machine learning. If you want to learn Machine Learning algorithms, and especially if you plan on using other languages than Python, I would recommend Machine Learning by Andrew Ng (Stanford University) instead.

Facts about Kaggle’s Intermediate Machine Learning course

It’s completely free, like everything else on Kaggle
You run everything in your browser and you can easily continue where you left off.
The exercises are not so demanding in this course.
This course takes about 10 hours to complete.
Instructor is Alexis Cook, Head of Kaggle Learn.

Data scientist course reviews with Kristofer Björnström

Discussion about this post