This is a course review of the Pandas course on Kaggle made by Aleksey Bilogur. The course is great and has helped me alot when I started with Pandas.
What is Pandas?
Pandas is great to know if you need to handle bigger tables of data in Python. For machine learning practitioners it’s a must to know Pandas in order to efficiently extract, transform and clean data before you apply your algorithms to solve machine learning problems. Pandas is the library and there you have two objects; DataFrame (multiple columns, like a table) and Series (single column). These objects are your containers for data. Pandas go very well with machine learning library scikit-learn and TensorFlow.
Pandas course overview
Kaggle’s Pandas course is made up of 6 lessons of varying length between 30 minutes and maybe 2 hours. As usual, everything on Kaggle is free and you have hints and solutions for all exercises to watch when you get stuck.
First lesson is about reading data from CSV files with Pandas, saving data to CSV and creating data in Pandas by converting Python dictionaries to DataFrames (or Series in the case of only one column).
Note: JSON files and formats, which are very common in the real world in for example Rest API:s, are essentially dictionaries. In this lesson you will learn what you need to go from JSON to Pandas DataFrame.
Lesson two is important to master. Here you learn how to select parts of data from a DataFrame given certain conditions. You use loc and iloc functions for your conditional selections. You also learn to use the native selectors, how indexing works and how to create a new column in an existing DataFrame.
Lesson three is about summary functions but more importantly you learn when to use map and apply together with a lambda function on your DataFrame. This is good to know when you need to perform a custom function on each row in your DataFrame and save the results to a new column.
Lesson four is all about groupby and sorting. For you with a background in SQL this is the easiest lesson. For those new to groupby, this is great if you want to use an aggregate function like sum or count for each group of some kind in another column.
Lesson five is a short one about data types and how to handle missing values. For more info on how to handle missing values, check out Kaggle’s course on Data Cleaning.
The final lesson number six teaches you on how to rename columns and how to join to DataFrames together. The important functions here are concat(), join() and merge(). To learn all about the difference between these three ways of combining data, I strongly recommend reading through Pandas User Guide on Merging.
My thoughts on the Pandas course
Overall, Pandas course on Kaggle is great to start learning about this must have tool if you are working with large tables of data. Most likely this will not be enough but you will learn what you need to google in order to solve your problem at hand. If you haven’t worked with Pandas before you might find that this course is worth doing twice because it gives you all the basics you need.
Facts about Pandas course on Kaggle
It’s completely free, like everything else on Kaggle
You run everything in your browser and you can easily continue where you left off.
This course takes about 5 to 10 hours depending on your level of Python when you start.
This course is covering all the basics of Pandas and DataFrames.
Instructor is Aleksey Bilogur.