Sign in

Top 1000 Writer on Medium | Top Writer in AI and Technology |

Gradient-boost your XGBoost knowledge by learning these crucial lessons

Photo by Haithem Ferdi on Unsplash. All images are by the author unless specified otherwise.

XGBoost is a real beast.

It is a tree-based power horse that is behind the winning solutions of many tabular competitions and datathons. Currently, it is the “hottest” ML framework of the “sexiest” job in the world.

While basic modeling with XGBoost can be straightforward, you need to master the nitty-gritty to achieve maximum performance.

With that said, I present you this article, which is the result of

  • hours of reading the documentation (it wasn’t fun)
  • crying through some awful but useful Kaggle kernels
  • hundreds of Google keyword searches
  • completely exhausting my Medium membership by reading a lotta articles


Learn what the kool kids are doing

Photo by Pixaline on Pixabay. All images are by the author unless specified otherwise.


On Kaggle, everyone knows that to win a tabular competition, you need to out-feature engineer others. Almost anyone can perform awesome EDA, develop a validation strategy and tune hyperparameters to squeeze every bit of model performance.

The key to the top is always feature engineering, and it is not something taught in tutorials, books, or courses. It is all about creativity, experience, and domain knowledge.

With the addition of the time component, feature engineering becomes even more important in time-series forecasting challenges. This has been proven once again by the top players participating in this month’s (July) TPS Playground competition.

Discover the most popular tools to implement object storage systems for free

Storage units
Photo by Joshua Coleman on Unsplash.

As noted in Forbes, more than 80% of data in organizations today is unstructured. Traditionally, companies have ignored this type of data because of the challenges they face analyzing it and generating meaningful insights. However, the landscape is rapidly changing as other types of storage systems are being invented, such as block-, file-, and object-based storage systems.

Among the three, object storage seems most promising, which is proven by the fact that Goliaths like Amazon, Google, and IBM already offer enterprise solutions for object-based data repositories.

While such commercial options certainly offer many features, it is worth exploring free additions…

Algorithms can’t handle non-stationary. They need static relationships.

Photo by Jonathan Pielmayer on Unsplash. All images are by the author unless specified otherwise.


Unlike ordinary machine learning problems, time series forecasting requires extra preprocessing steps.

On top of the normality assumptions, most ML algorithms expect a static relationship between the input features and the output.

A static relationship requires inputs and outputs with constant parameters such as mean, median, and variance. In other words, algorithms perform best when the inputs and outputs are stationary.

This is not the case in time series forecasting. Distributions that change over time can have unique properties such as seasonality and trend. …

Find out if the target is worth forecasting

Photo by Tobi on Pexels. All images are by the author unless specified otherwise.


No matter how powerful, machine learning cannot predict everything. A well-known area where it can become pretty helpless is related to time series forecasting.

Despite the availability of a large suite of autoregressive models and many other algorithms for time series, you cannot predict the target distribution if it is white noise or follows a random walk.

So, you must detect such distributions before you make further efforts.

In this article, you will learn what white noise and random walk are and explore proven statistical techniques to detect them.

Before we start…

This is my third article on the time series forecasting series…

And other techniques to find relationships between multiple time series

Photo by Jordan Benton on Pexels


Following my very well-received post and Kaggle notebook on every single Pandas function to manipulate time series, it is time to take the trajectory of this TS project to visualization.

This post is about the core processes that make up an in-depth time series analysis. Specifically, we will talk about:

  • Decomposition of time series — seasonality and trend analysis
  • Analyzing and comparing multiple time series simultaneously
  • Calculating autocorrelation and partial autocorrelation and what they represent

and if seasonality or trends among multiple series affect each other.

Most importantly, we will build some very cool visualizations, and this image should be…

From basic time-series metrics to window functions

Photo by Jordan Benton on Pexels

Introduction to this project on Time Series Forecasting

Recently, the Optiver Realized Volatility Prediction competition has been launched on Kaggle. As the name suggests, it is a time series forecasting challenge.

I wanted to participate, but it turns out my knowledge in time series couldn’t even begin to suffice to participate in a competition of such a magnitude. So, I accepted this as the ‘kick in the pants’ I needed to start paying serious attention to this large sphere of ML.

As the first step, I wanted to learn and teach every single Pandas function you can use to manipulate time-series data. …

Deep and rapid comparison in terms of 7 key aspects

Goofy Image by Author

There is an annoying habit of soccer fans. Whenever a young but admittedly exceptional player emerges, they compare him to legends like Messi or Ronaldo.

They choose to forget that the legends have been dominating the game since before the newbies had regrown teeth.

Comparing Plotly to Matplotlib was, in a sense, similar to that in the beginning. Matplotlib had been in heavy use since 2003, and Plotly had just come out in 2014.

Many were bored with Matplotlib by this time, so Plotly was warmly welcomed for its freshness and interactivity. …

What it does, how it works, its flaws, and should you be worried as a data scientist…

What can a $1 billion investment buy?

On Tuesday this week, OpenAI and GitHub answered this question boldly with the preview of a new AI tool — GitHub Copilot. It can write user-compatible code and is much better at the task than its predecessor — GPT-3.

Copilot autocompletes code snippets, suggests new lines of code, and can even write whole functions based on the description provided. According to the GitHub blog, the tool is not just a language-generating algorithm based on user input — it is a virtual pair programmer.

It learns and adapts to the user’s coding habits, analyzes…

Let’s get interactive with Plotly!

Photo by Brian McGowan on Pexels


For the past few years, there has been an explosion of interest in interactive plots. People were bored out of their minds from the old, static plots they have been using since before they were born. This was understandable because, unlike other aspects of data science, which rapidly change, there had not been many advancements in data visualization. Matplotlib, Seaborn, and ggplot have been dominating the game, and many people wanted a change.

Then, Plotly came.

In 2015, a widely accepted Plotly.js data visualization framework was open-sourced for Python and R, making Plotly the most downloaded graphing library in the…

Bex T.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store