“Great… another article on how to make Pandas n times faster.”
I think I have said that countless times for the past two years I have been using Pandas. The most recent one I saw said, “make Pandas 71,803 times faster”.
But I won’t give you that kind of promise. I will just show you how to use Pandas in the fastest way possible. Because you can’t speed up something which is already fast. Doesn’t make sense?
Let me ask you this. Which hand do you use when I say, “Touch your right ear.” Of course, the right hand. You…
Learn to write custom Sklearn preprocessing transformers that make your code exceptional.
predict - how awesome would that be?
You get the data, fit your pipeline just one time, and it takes care of everything — preprocessing, feature engineering, modeling, everything. All you have to do is call predict and have the output.
What kind of pipeline is that powerful? Yes, Sklearn has many transformers, but it doesn’t have one for every imaginable preprocessing scenario. So, is such a pipeline a pipe dream?
Absolutely not. Today, we will learn how to create custom Sklearn transformers that enable…
One of the difficult stages of my learning journey was about overcoming my fear of massive datasets. It wasn’t easy because working with million-row datasets was nothing like the tiny, toy datasets the online courses continuously gave me.
Today, I am here to share the concepts and tricks I have learned to handle the challenges of gigabyte-sized datasets with millions or even billions of rows. By the end, they will feel to you almost as natural as working with the Iris or Titanic.
Comprehensive tutorial on LightGBM hyperparameters and how to tune them using Optuna.
In the previous article, we talked about the basics of LightGBM and creating LGBM models that beat XGBoost in almost every aspect. This article focuses on the last stage of any machine learning project — hyperparameter tuning (if we omit model ensembling).
First, we will look at the most important LGBM hyperparameters, grouped by their impact level and area. Then, we will see a hands-on example of tuning LGBM parameters using Optuna — the next-generation bayesian hyperparameter tuning framework.
Most importantly, we will do this in a similar…
Learn how to crush XGBoost in this comprehensive LightGBM tutorial.
I am confused.
So many people are drawn to XGBoost like a moth to a flame. Yes, it has seen some glorious days in prestigious competitions, and it’s still the most widely-used ML library.
But, it has been 4 years since XGBoost lost its top spot in terms of performance. In 2017, Microsoft open-sourced LightGBM (Light Gradient Boosting Machine) that gives equally high accuracy with 2–10 times less training speed.
This is a game-changing advantage considering the ubiquity of massive, million-row datasets. …
“I’m going to throw up over my RGB backlit-keyboard so hard if I see one more person using Titanic, Iris, Wine, or Boston datasets!”
This is the (slightly exaggerated) feeling you might gradually develop after being a data science learner for a while. You just can’t help it — everyone wants the easy thing. Beginners use these datasets because they are stupidly straightforward; most course creators and bloggers use them because they are just a single Google search away (or even bookmarked).
In 85+ articles I have written, I honestly can’t remember using any of those clichés (if my memory…
Do you know how many lines of code went into creating the Essence of Linear Algebra series of 3Blue1Brown?
They are snazzy…
They are breathtaking…
They remind you just how beautiful and powerful data visualization can be.
They only take a few seconds to deliver their message.
But take hours to create.
And require more than coding skills to master.
Such plots have a theme.
They don’t just visualize.
They speak to the observer.
Ultimately, you only have to watch and digest the information — no room for guessing or figuring out on your part.
So, watch and digest!
Kaggle is a hot spot for what is trending in data science and machine learning.
Due to its competitiveness, the top players are constantly looking for new tools, technologies, and frameworks that give them an edge over others. If a new package or an algorithm delivers actionable value, there is a good chance it receives immediate adoption and becomes popular.
This post is about 7 of such trendies packages that are direct replacements for many tools and technologies that are either outdated or need urgent upgrades.
“I wish I could do this operation in Pandas….”
Well, chances are, you can!
Pandas is so vast and deep that it enables you to execute virtually any tabular manipulation you can think of. However, this vastness sometimes comes at a disadvantage.
Many elegant features that solve rare edge-cases, unique scenarios are lost in the documentation, shadowed by the more frequently used functions.
This article aims to rediscover those features and show you that Pandas is more capable than you ever knew.
ExcelWriter is a generic class for creating excel files (with sheets!) and writing DataFrames to them. …